Quick Question - Githubissues

I am reading the codes and notice that in llama/generation.py line 77 we have: i = tokens[:, prev_pos:cur_pos]. But from the second iteration, cur_pos=prev_pos+1, so i only include 1 token.

Is that correct? I thought that Transformer models need to take all previous tokens as their input. I am just a beginner for these models and I really wants to get more understanding.

juncongmoo / pyllama

Quick Question #112