why `max_gen_len` is needed when considering `space_needed`?

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

https://arxiv.org/abs/2309.17453

MIT License

6.59k stars 361 forks source link

Open Mr-lonely0 opened 6 months ago

Mr-lonely0 commented 6 months ago

Thank you for your time, here are my questions:

I wonder why cache_size is updated only at every turn rather than every decoding step.
When considering space needed in cache, why max_gen_len is needed to include