mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.38k stars 355 forks source link

The position id for q #60

Open ofhwei opened 8 months ago

ofhwei commented 8 months ago

Hello,

I wonder if the position id of query is the same with key or is the actual generated context length (this comment is confusing me)? For example, as mentioned in the toy example, the position id for key of M is 9. What's the position id for q of M? 9 or 12 here? And more , why make this choice?

Looking forward to your response. Thank you.

Guangxuan-Xiao commented 8 months ago

We also apply position in the cache to query instead of the position in the actual text. In the toy example, the position ID for the query is also 9. This is because we want to limit position encoding used within the cache size so we can generate text much longer than the pre-training window.

Guangxuan