bfloat16 of fused attention seems have bug

triton-lang / triton

Development repository for the Triton language and compiler

https://triton-lang.org/

MIT License

12.67k stars 1.53k forks source link

bfloat16 of fused attention seems have bug #4469

Open NonvolatileMemory opened 1 month ago

NonvolatileMemory commented 1 month ago

both fused-attention and flash-attn-og cannot pass the bfloat16 test

PheelaV commented 1 month ago

Triton does not support bfloat16

https://triton-lang.org/main/programming-guide/chapter-3/debugging.html#limitations

NonvolatileMemory commented 1 month ago

Thanks for your reply!

However, based on my test case, when using group query attention, the gradient of k and v cannot pass allclose with torch implementation vs fused-attention.