Running with Flash Attention 1

vikhyat / moondream

tiny vision language model

https://moondream.ai

Apache License 2.0

4.85k stars 431 forks source link

Running with Flash Attention 1 #105

Open Bikram9035 opened 2 months ago

Bikram9035 commented 2 months ago

Hello, Please let me know how do I run Moondream2 using Flash Attention 1 since am trying to run it on kaggle or colab using t4 gpus so flash attention 2 won't work. You have just mentioned to use flash attention 1 but the exact syntax is no where to be found so guess work is giving me errors

As a beginner learner this is so overwhelming with lot of outdated misinformation online, hope you will understand my situation.

Thank you

vikhyat commented 1 month ago

I don’t think HF transformers supports Flash Attention 1.0, so you would have to edit the attention classes in the model definition.