Open ofhwei opened 8 months ago
We also apply position in the cache to query instead of the position in the actual text. In the toy example, the position ID for the query is also 9. This is because we want to limit position encoding used within the cache size so we can generate text much longer than the pre-training window.
Guangxuan
Hello,
I wonder if the position id of query is the same with key or is the actual generated context length (this comment is confusing me)? For example, as mentioned in the toy example, the position id for key of M is 9. What's the position id for q of M? 9 or 12 here? And more , why make this choice?
Looking forward to your response. Thank you.