tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650 stars 41 forks source link

Avoid overly strict "transformers==4.34.0", #26

Open pseudotensor opened 9 months ago

pseudotensor commented 9 months ago

Makes it hard to upgrade. Thanks!

tomaarsen commented 9 months ago

Makes sense! I hate strict requirements myself, but for attention_sinks I'm somewhat forced into it. This project works by overriding the forward method of the ...Attention classes in transformers, which are often updated, even in minor versions. Any mismatch can cause failures, so I can really only support one transformers version at a time. At least, until https://github.com/huggingface/transformers/pull/26681 is merged and Attention Sinks can be implemented that way.

pseudotensor commented 9 months ago

ok thanks for consideration.