Closed kevinli573 closed 11 months ago
Yes, the same breakage will also happen. You can simply pass --start_size 0
to check this: https://github.com/mit-han-lab/streaming-llm/blob/main/examples/run_streaming_llama.py#L118.
Guangxuan
Great, thank you for your quick reply!
Great paper and demo!
It seems like the video demo shows naive self-attention, i.e., no sliding window, on the left-hand side. Before it runs out of memory, it produces unicode/gibberish. Does the same breakage, i.e., producing unicode, etc., happen for naive sliding window as well even if it doesn't run into the out of memory issue?
Thank you!