issues
search
mit-han-lab
/
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k
stars
361
forks
source link
why `max_gen_len` is needed when considering `space_needed`?
#78
Open
Mr-lonely0
opened
6 months ago
Mr-lonely0
commented
6 months ago
Thank you for your time, here are my questions:
I wonder why cache_size is updated only at every turn rather than every decoding step.
When considering space needed in cache, why max_gen_len is needed to include
Thank you for your time, here are my questions: