intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Apache License 2.0
2.11k stars 206 forks source link

[Feature request] Support for Flashattention 3 #1665

Closed sleepingcat4 closed 1 month ago

sleepingcat4 commented 2 months ago

I had previous programmed for XLA and Cuda machines but recently while trying out HPUs, I quite seem to like it. Even though the software stake is awful and there's no actual helpline/documentation to refer to except reading the code which itself is written for CLI interface in mind.

I was thinking to use Intel Gaudi2 for some projects and develop new architecture since I have some clusters. I was curious if FlasAttention support already exists and if not then can the devs offer support for FlashAttention 3?

https://github.com/Dao-AILab/flash-attention

a32543254 commented 2 months ago

Really happy you are using and interesting in habana. we do put a lot effort on habana now, and we really care about user experience, and I think we are improving them along with time. for usage of habana, you can also reach to repo https://github.com/huggingface/optimum-habana, they should provide you more info about habana side.