Last generated token getting ignored in streaming.py?

Hello,

I was looking into the streaming.py code and noticed that in greedy_generate() we overwrite what the previous input_ids was on line 33. According to my understanding of the code, this will replace whatever token is last generated and assigned to input_ids on line 42. Essentially the last generated token for every prompt is never considered as a Query token.

I don't think it will lead to any significant change in the model outputs, but just wanted to confirm if my understanding is correct.

Thanks for sharing this implementation btw!

Ritik

tomaarsen / attention_sinks

Last generated token getting ignored in streaming.py? #45