tomaarsen / attention_sinks

Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
https://huggingface.co/blog/tomaarsen/attention-sinks
Apache License 2.0
650 stars 41 forks source link

Flash Attention Support #30

Open Jiayuanhip opened 8 months ago

Jiayuanhip commented 8 months ago

Hi,

Thanks for great work!

I am wondering whether flash attention is supported in attention sinks current repo.

Thanks

tomaarsen commented 8 months ago

Hello!

In its current form, I believe none of the models support flash attention.