mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.38k stars 355 forks source link

Implementing lama2 7b #49

Closed MuhammadIshaq-AI closed 9 months ago

MuhammadIshaq-AI commented 9 months ago

I have a question related how to run the code, i follow up all the instructions that are mentioned in the repo but my confusion is will the model be downloaded itself, for example i want to test the code for lama2 7b chat model, how to use this streaming llama code for that?