jt-zhang commented 1 week ago

New contributor declaration

[x] I am not making a trivial change, such as fixing a typo in a comment.
[x] I have written a PR description following these rules.
[x] I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- [ ] I have added tests.
- /test for lit tests
- /unittest for C++ tests
- /python/test for end-to-end tests
- [x] This PR does not need a test because this is a tutorial file.
Select one of the following.
- [x] I have not added any lit tests.
- [ ] The lit tests I have added follow these best practices, including the "tests should be minimal" section. (Usually running Python code and using the instructions it generates is not minimal.)

SageAttention is a 8-bit attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without losing end-to-end metrics across various models. This PR provides the official implementation, and we have verified the correctness of the codes.

ThomasRaoux commented 1 week ago

This would belong https://github.com/triton-lang/kernels repo instead

Jokeren commented 1 week ago

I agree that it's not a good fit

triton-lang / triton

Add Sageattention Codes as a tutorial #5159

New contributor declaration