mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k stars 361 forks source link

Question about Naive Sliding Window #63

Closed kevinli573 closed 11 months ago

kevinli573 commented 11 months ago

Great paper and demo!

It seems like the video demo shows naive self-attention, i.e., no sliding window, on the left-hand side. Before it runs out of memory, it produces unicode/gibberish. Does the same breakage, i.e., producing unicode, etc., happen for naive sliding window as well even if it doesn't run into the out of memory issue?

Thank you!

Guangxuan-Xiao commented 11 months ago

Yes, the same breakage will also happen. You can simply pass --start_size 0 to check this: https://github.com/mit-han-lab/streaming-llm/blob/main/examples/run_streaming_llama.py#L118.

Guangxuan

kevinli573 commented 11 months ago

Great, thank you for your quick reply!