thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
BSD 3-Clause "New" or "Revised" License
400 stars 17 forks source link

add sageattn_varlen support #36

Closed jason-huang03 closed 1 week ago

jason-huang03 commented 1 week ago

Support varlen api, with input format same as flash attention