mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k stars 361 forks source link

why `max_gen_len` is needed when considering `space_needed`? #78

Open Mr-lonely0 opened 6 months ago

Mr-lonely0 commented 6 months ago

Thank you for your time, here are my questions:

  1. I wonder why cache_size is updated only at every turn rather than every decoding step.
  2. When considering space needed in cache, why max_gen_len is needed to include