mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.38k stars 355 forks source link

Doubts in "run_streaming_llama.py" file #64

Open Rishab9991 opened 8 months ago

Rishab9991 commented 8 months ago

Hi, I have a couple of queries in the "run_streaming_llama.py" file.

  1. What does "start_size", "recent_size" indicate in the context of StreamingLLM?
  2. What is the difference between "kv_cache" and "recent_size"?
  3. When we are not using the "--enable_streaming" flag, does it mean that the LLM will be using the Sliding Window w/ recomputation approach?