Can support to codellama34b?

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

https://arxiv.org/abs/2309.17453

MIT License

6.38k stars 355 forks source link

Closed willshion closed 9 months ago

Guangxuan-Xiao commented 9 months ago