tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650 stars 41 forks source link

Update QWen due to changes in the modeling files of QWen-7b #33

Closed tomaarsen closed 8 months ago

tomaarsen commented 8 months ago

Closes #32

Hello!

Pull Request overview

Details

https://huggingface.co/Qwen/Qwen-7B/commit/f7bc352f27bb1c02ee371a4576942a7d96c8bb97 updated the modeling_qwen.py to no longer pass registered_causal_mask, but just to make it on the fly. This PR now also makes it on the fly.