issues
search
sustcsonglin
/
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
MIT License
1.24k
stars
66
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Bug]: H100 memory access violations in chunk_gla
#68
SmerkyG
closed
6 days ago
4
[Bug]: new autotune error running simple gla chunked
#67
SmerkyG
closed
1 week ago
4
[Bug]: multi-GPU, TypeError: 'NoneType' object is not a mapping
#66
n2729648074
opened
1 week ago
1
Add fine-grained warning category for easier supression
#65
mirceamironenco
closed
1 week ago
1
RuntimeError: Triton Error [CUDA]: invalid argument
#64
TiminHu
closed
1 week ago
4
[Bug]: Mamba2 incorrect inference time behavior
#63
zhixuan-lin
closed
2 weeks ago
1
add chunked kl div
#62
ChaosCodes
closed
2 weeks ago
1
why delta_net so slow in inference ?
#61
ching-sui1995
opened
2 weeks ago
6
About `rescale_prenorm_residual` default value in Mamba 2
#60
zhixuan-lin
closed
3 weeks ago
1
Correctly compute `max_seqlen` when `max_position_embeddings` is `None`
#59
zhixuan-lin
closed
3 weeks ago
1
[Bug]: H100 Triton 3.0.0 compile crash when using num_warps=8 in autotune
#58
SmerkyG
closed
1 week ago
1
[`Mamba2`] Post Merge Fixes - `norm_before_gate` and generation with `inputs_embeds`
#57
vasqu
closed
3 weeks ago
1
Add `__init__.py` in `fla/ops/common` for automatic package discovery
#56
zhixuan-lin
closed
4 weeks ago
1
Fix syntax error
#55
JulienSiems
closed
1 month ago
1
Update amp custom_fwd, custom_bwd usage for torch 2.4.0 compatibility
#54
mirceamironenco
closed
1 month ago
1
Checkpoints for 340M models
#53
0205090923
closed
1 week ago
1
Chunk wise linear attn kernel does not work with torch compile (returns incorrects values / NaNs)
#52
juankost
closed
1 month ago
11
Variable-length sequence support
#51
patronum08
closed
1 month ago
2
benchmark script for simple_gla vs mamba2 kernel
#50
learning-chip
closed
1 month ago
2
Replace mamba2 `mamba_chunk_scan_combined` triton kernel by `simple_gla` triton kernel
#49
learning-chip
closed
1 month ago
3
[RWKV6] fix backward if h0 not passed
#48
hypnopump
closed
1 month ago
1
bug in treatment of scale for fused_chunk_linear_attn
#47
SmerkyG
closed
1 month ago
1
Hello from HF Diffusers
#46
sayakpaul
closed
1 week ago
5
[Attn] fix negative value of seqlen offset during sft
#45
ChaosCodes
closed
1 month ago
1
enhance fla support for RWKV6
#44
uniartisan
closed
1 week ago
19
[DRAFT] Beta gradient does not match
#43
hypnopump
closed
1 month ago
0
[DeltaNet] Adds beta as a vector option
#42
hypnopump
closed
1 month ago
2
Beta vec
#41
hypnopump
closed
1 month ago
0
Minor mamba-2 fixes
#40
DanFosing
closed
1 month ago
0
Add implementations of Mamba 2 into FLA
#39
DanFosing
closed
1 month ago
19
RuntimeError: Triton Error [CUDA]: device-side assert triggered for fla.modules.layernorm.py
#38
K-H-Ismail
closed
2 months ago
5
fix: enhance state gradient when bf16
#37
uniartisan
closed
2 months ago
1
High precision and gradient discrepancy in RWKV Triton implementation between chunk and recurrent_fuse
#36
uniartisan
closed
2 months ago
2
fix: calculate du on different batch
#35
uniartisan
closed
2 months ago
1
Add implementations of Mamba 2 into FLA
#34
DanFosing
closed
1 month ago
7
Lack of speed advantage in GLA training
#33
Yingyue-L
closed
1 month ago
4
benchmark_training_throughput and bugs
#32
rakkit
closed
2 months ago
5
Quick question: Is there a non-causal optimized form of Flash Linear Attention?
#31
yzeng58
closed
1 month ago
7
training efficiency of GLA
#30
pengzhangzhi
closed
2 months ago
11
Current FLA RWKV6 implementation has significant precision issues in pure bf16 mode
#29
howard-hou
closed
1 month ago
3
bugs in BasedLinearAttention/LinearAttention/HGRN2Attention Implementation
#28
rakkit
closed
3 months ago
4
Transformer model not learning after adding a classification head
#27
OREYR
closed
1 month ago
13
Use Cache with GLA model raised error
#26
OREYR
closed
3 months ago
6
dtype error when implementing modeling_transformer.py
#25
OREYR
closed
3 months ago
1
IS there any speed(train and inference) and memory benchmarking comparison between GLA and Mamba?
#24
dongzhuoyao
closed
4 months ago
1
missing hidden_size in linear attention
#23
yxchng
closed
4 months ago
1
AssertionError('All values in both first input shape ([constexpr[16], constexpr[8]]) and second input shape ([constexpr[8], constexpr[16]]) must be >= 16!')
#22
yxchng
closed
4 months ago
17
Does the wkv6 operator now support parallel inference?
#21
JL-er
closed
3 months ago
2
inconsistent results when "masking" gating term between "fused_recurrent" and "fused_chunk" (fused_chunk presumably wrong)
#20
theodorblackbird
closed
3 months ago
1
更新后的rwkv6,loss会nan
#19
JL-er
closed
4 months ago
16
Next