issues
search
thu-ml
/
SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
BSD 3-Clause "New" or "Revised" License
400
stars
17
forks
source link
add sageattn_varlen support
#36
Closed
jason-huang03
closed
1 week ago
jason-huang03
commented
1 week ago
Support varlen api, with input format same as flash attention
Support varlen api, with input format same as flash attention