issues
search
thu-ml
/
SageAttention
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
BSD 3-Clause "New" or "Revised" License
399
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
AssertionError
#42
Maritime-Moon
opened
4 days ago
8
Getting triton compilation errors when calculating attention
#41
JohnnyRacer
closed
3 days ago
2
add cuda kernel for per block and per warp quantization
#40
jason-huang03
closed
6 days ago
0
update license
#39
jason-huang03
closed
6 days ago
0
v1.0.4
#38
jason-huang03
closed
6 days ago
0
Please create a ComfyUI node to use SageAttention
#37
wardensc2
opened
6 days ago
2
add sageattn_varlen support
#36
jason-huang03
closed
6 days ago
0
better support for hd96
#35
jason-huang03
closed
1 week ago
0
Suppose return LSE for sequence parallel
#34
jason-huang03
opened
1 week ago
0
[update] support non-contiguous input, different qo_len and kv_len, HND and NHD layout, group query attention
#33
jason-huang03
closed
1 week ago
0
Sageattention in flux
#32
todochenxi
closed
1 week ago
4
Are you planning to provide a varlen and bnsd API?
#31
tlogn
closed
5 days ago
8
initialize l_i as zeros
#30
feifeibear
closed
1 week ago
0
关于SageAttention 性能为什么在RTX 4090 和 RTX 3090有明显效果
#29
MeJerry215
closed
2 weeks ago
1
compatible with other quantization methos
#28
chenchunhui97
closed
1 week ago
2
Q matrix quantization
#27
liangan1
closed
1 week ago
1
got result error when seq_length of q not equals to k/v
#26
beegerous
closed
3 weeks ago
7
q_kernel_per_block_int8 error in distributed settings.
#25
feifeibear
closed
3 weeks ago
0
Why divide ln 2 in quantiation Q value?
#24
MeJerry215
closed
3 weeks ago
1
all black video are generated for Open-Sora-Plan using sageattention
#23
littletomatodonkey
closed
3 weeks ago
3
Real accelerated benefits
#22
lswzjuer
closed
3 weeks ago
2
Why Running Llama infer in A10 get Wrong answer?
#21
MeJerry215
closed
1 week ago
4
Can SageAttention available on AMD GPUs?
#20
guanchenl
closed
1 week ago
1
exist nan when using sageattn
#19
Pydataman
closed
6 days ago
6
Notation error in Equation (2)
#18
Coco58323
closed
3 weeks ago
1
Would support other headdim
#17
v4if
opened
3 weeks ago
2
Other SageAttention Kenerls
#16
Andy0422
opened
3 weeks ago
1
Do you plan to integrate this algorithm into the vllm project?
#15
Alienfeel
opened
4 weeks ago
0
遇到些兼容性问题
#14
otoTree
opened
4 weeks ago
5
Can you provide an example for LLaMA?
#13
jyweky
closed
4 weeks ago
1
Question about INT8 v.s. FP8
#12
lingffff
closed
1 month ago
1
SageAttention on ComfyUI
#11
blepping
opened
1 month ago
2
Accuracy Comparson in Kernel Level
#10
DoubleClark
closed
1 month ago
2
is support stable diffusion?
#9
libai-lab
opened
1 month ago
3
Would something like this be possible for apple silicon?
#8
cchance27
opened
1 month ago
0
Windows compile issue when testing CogVideoX script
#7
SoftologyPro
opened
1 month ago
8
After update diffusers CogVideoX fails due to the dtype check
#6
kijai
closed
1 month ago
3
Can it possible make a PR merge it into Flashatten?
#5
luohao123
closed
1 month ago
1
How can I make it work on Windows?
#4
jpgallegoar
closed
1 month ago
3
Question about performance on A100
#3
a-r-r-o-w
opened
1 month ago
8
BF16 q,k,v
#2
timothelaborie
closed
1 month ago
2
Example usage doesn't work
#1
blepping
closed
1 month ago
1