Open pseudotensor opened 9 months ago
Makes sense! I hate strict requirements myself, but for attention_sinks
I'm somewhat forced into it. This project works by overriding the forward
method of the ...Attention
classes in transformers
, which are often updated, even in minor versions. Any mismatch can cause failures, so I can really only support one transformers
version at a time. At least, until https://github.com/huggingface/transformers/pull/26681 is merged and Attention Sinks can be implemented that way.
ok thanks for consideration.
Makes it hard to upgrade. Thanks!