tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650 stars 41 forks source link

Add exception for when FA is used with QWen #25

Closed tomaarsen closed 9 months ago

tomaarsen commented 9 months ago

Related to #24.

Hello!

Pull Request overview