[Feature request] Support for Flashattention 3

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Apache License 2.0

2.11k stars 206 forks source link

I had previous programmed for XLA and Cuda machines but recently while trying out HPUs, I quite seem to like it. Even though the software stake is awful and there's no actual helpline/documentation to refer to except reading the code which itself is written for CLI interface in mind.

I was thinking to use Intel Gaudi2 for some projects and develop new architecture since I have some clusters. I was curious if FlasAttention support already exists and if not then can the devs offer support for FlashAttention 3?

https://github.com/Dao-AILab/flash-attention

intel / intel-extension-for-transformers

[Feature request] Support for Flashattention 3 #1665