triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
13.5k stars 1.67k forks source link

Add Sageattention Codes as a tutorial #5159

Closed jt-zhang closed 1 week ago

jt-zhang commented 1 week ago

New contributor declaration

SageAttention is a 8-bit attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without losing end-to-end metrics across various models. This PR provides the official implementation, and we have verified the correctness of the codes.

ThomasRaoux commented 1 week ago

This would belong https://github.com/triton-lang/kernels repo instead

Jokeren commented 1 week ago

I agree that it's not a good fit