issues
search
tomaarsen
/
attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650
stars
41
forks
source link
Add exception for when FA is used with QWen
#25
Closed
tomaarsen
closed
9 months ago
tomaarsen
commented
9 months ago
Related to #24.
Hello!
Pull Request overview
Add exception for when FA is used with QWen
Tom Aarsen
Related to #24.
Hello!
Pull Request overview