Doubts in "run_streaming_llama.py" file

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

https://arxiv.org/abs/2309.17453

MIT License

6.38k stars 355 forks source link

Open Rishab9991 opened 8 months ago

Rishab9991 commented 8 months ago

Hi, I have a couple of queries in the "run_streaming_llama.py" file.

What does "start_size", "recent_size" indicate in the context of StreamingLLM?
What is the difference between "kv_cache" and "recent_size"?
When we are not using the "--enable_streaming" flag, does it mean that the LLM will be using the Sliding Window w/ recomputation approach?