thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
BSD 3-Clause "New" or "Revised" License
400 stars 17 forks source link

Real accelerated benefits #22

Closed lswzjuer closed 3 weeks ago

lswzjuer commented 3 weeks ago

Actual tests found that there was no acceleration benefit on A100 and A10 series graphics cards. Are you involved in false advertising?

MeJerry215 commented 3 weeks ago

The paper say only has benefits in PTX 4096 and PTX 3090. I also try in the A10 series graphics cards, no acceleration benefit.

jason-huang03 commented 3 weeks ago

The paper is focued on 4090 and 3090, whose tensor core/cuda core throughput is different from that of A100 and A10.