juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models
GNU General Public License v3.0
2.8k stars 312 forks source link

Quick Question #112

Closed ArgusK17 closed 8 months ago

ArgusK17 commented 8 months ago

I am reading the codes and notice that in llama/generation.py line 77 we have: i = tokens[:, prev_pos:cur_pos]. But from the second iteration, cur_pos=prev_pos+1, so i only include 1 token.

Is that correct? I thought that Transformer models need to take all previous tokens as their input. I am just a beginner for these models and I really wants to get more understanding.