thu-ml SageAttention issues

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

BSD 3-Clause "New" or "Revised" License

399 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

AssertionError

#42 Maritime-Moon opened 4 days ago
8
Getting triton compilation errors when calculating attention

#41 JohnnyRacer closed 3 days ago
2
add cuda kernel for per block and per warp quantization

#40 jason-huang03 closed 6 days ago
0
update license

#39 jason-huang03 closed 6 days ago
0
v1.0.4

#38 jason-huang03 closed 6 days ago
0
Please create a ComfyUI node to use SageAttention

#37 wardensc2 opened 6 days ago
2
add sageattn_varlen support

#36 jason-huang03 closed 6 days ago
0
better support for hd96

#35 jason-huang03 closed 1 week ago
0
Suppose return LSE for sequence parallel

#34 jason-huang03 opened 1 week ago
0
[update] support non-contiguous input, different qo_len and kv_len, HND and NHD layout, group query attention

#33 jason-huang03 closed 1 week ago
0
Sageattention in flux

#32 todochenxi closed 1 week ago
4
Are you planning to provide a varlen and bnsd API?

#31 tlogn closed 5 days ago
8
initialize l_i as zeros

#30 feifeibear closed 1 week ago
0
关于SageAttention 性能为什么在RTX 4090 和 RTX 3090有明显效果

#29 MeJerry215 closed 2 weeks ago
1
compatible with other quantization methos

#28 chenchunhui97 closed 1 week ago
2
Q matrix quantization

#27 liangan1 closed 1 week ago
1
got result error when seq_length of q not equals to k/v

#26 beegerous closed 3 weeks ago
7
q_kernel_per_block_int8 error in distributed settings.

#25 feifeibear closed 3 weeks ago
0
Why divide ln 2 in quantiation Q value?

#24 MeJerry215 closed 3 weeks ago
1
all black video are generated for Open-Sora-Plan using sageattention

#23 littletomatodonkey closed 3 weeks ago
3
Real accelerated benefits

#22 lswzjuer closed 3 weeks ago
2
Why Running Llama infer in A10 get Wrong answer?

#21 MeJerry215 closed 1 week ago
4
Can SageAttention available on AMD GPUs?

#20 guanchenl closed 1 week ago
1
exist nan when using sageattn

#19 Pydataman closed 6 days ago
6
Notation error in Equation (2)

#18 Coco58323 closed 3 weeks ago
1
Would support other headdim

#17 v4if opened 3 weeks ago
2
Other SageAttention Kenerls

#16 Andy0422 opened 3 weeks ago
1
Do you plan to integrate this algorithm into the vllm project?

#15 Alienfeel opened 4 weeks ago
0
遇到些兼容性问题

#14 otoTree opened 4 weeks ago
5
Can you provide an example for LLaMA?

#13 jyweky closed 4 weeks ago
1
Question about INT8 v.s. FP8

#12 lingffff closed 1 month ago
1
SageAttention on ComfyUI

#11 blepping opened 1 month ago
2
Accuracy Comparson in Kernel Level

#10 DoubleClark closed 1 month ago
2
is support stable diffusion?

#9 libai-lab opened 1 month ago
3
Would something like this be possible for apple silicon?

#8 cchance27 opened 1 month ago
0
Windows compile issue when testing CogVideoX script

#7 SoftologyPro opened 1 month ago
8
After update diffusers CogVideoX fails due to the dtype check

#6 kijai closed 1 month ago
3
Can it possible make a PR merge it into Flashatten?

#5 luohao123 closed 1 month ago
1
How can I make it work on Windows?

#4 jpgallegoar closed 1 month ago
3
Question about performance on A100

#3 a-r-r-o-w opened 1 month ago
8
BF16 q,k,v

#2 timothelaborie closed 1 month ago
2
Example usage doesn't work

#1 blepping closed 1 month ago
1