sustcsonglin flash-linear-attention issues

sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

MIT License

1.24k stars 66 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

[Bug]: H100 memory access violations in chunk_gla

#68 SmerkyG closed 6 days ago
4
[Bug]: new autotune error running simple gla chunked

#67 SmerkyG closed 1 week ago
4
[Bug]: multi-GPU, TypeError: 'NoneType' object is not a mapping

#66 n2729648074 opened 1 week ago
1
Add fine-grained warning category for easier supression

#65 mirceamironenco closed 1 week ago
1
RuntimeError: Triton Error [CUDA]: invalid argument

#64 TiminHu closed 1 week ago
4
[Bug]: Mamba2 incorrect inference time behavior

#63 zhixuan-lin closed 2 weeks ago
1
add chunked kl div

#62 ChaosCodes closed 2 weeks ago
1
why delta_net so slow in inference ?

#61 ching-sui1995 opened 2 weeks ago
6
About `rescale_prenorm_residual` default value in Mamba 2

#60 zhixuan-lin closed 3 weeks ago
1
Correctly compute `max_seqlen` when `max_position_embeddings` is `None`

#59 zhixuan-lin closed 3 weeks ago
1
[Bug]: H100 Triton 3.0.0 compile crash when using num_warps=8 in autotune

#58 SmerkyG closed 1 week ago
1
[`Mamba2`] Post Merge Fixes - `norm_before_gate` and generation with `inputs_embeds`

#57 vasqu closed 3 weeks ago
1
Add `__init__.py` in `fla/ops/common` for automatic package discovery

#56 zhixuan-lin closed 4 weeks ago
1
Fix syntax error

#55 JulienSiems closed 1 month ago
1
Update amp custom_fwd, custom_bwd usage for torch 2.4.0 compatibility

#54 mirceamironenco closed 1 month ago
1
Checkpoints for 340M models

#53 0205090923 closed 1 week ago
1
Chunk wise linear attn kernel does not work with torch compile (returns incorrects values / NaNs)

#52 juankost closed 1 month ago
11
Variable-length sequence support

#51 patronum08 closed 1 month ago
2
benchmark script for simple_gla vs mamba2 kernel

#50 learning-chip closed 1 month ago
2
Replace mamba2 `mamba_chunk_scan_combined` triton kernel by `simple_gla` triton kernel

#49 learning-chip closed 1 month ago
3
[RWKV6] fix backward if h0 not passed

#48 hypnopump closed 1 month ago
1
bug in treatment of scale for fused_chunk_linear_attn

#47 SmerkyG closed 1 month ago
1
Hello from HF Diffusers

#46 sayakpaul closed 1 week ago
5
[Attn] fix negative value of seqlen offset during sft

#45 ChaosCodes closed 1 month ago
1
enhance fla support for RWKV6

#44 uniartisan closed 1 week ago
19
[DRAFT] Beta gradient does not match

#43 hypnopump closed 1 month ago
0
[DeltaNet] Adds beta as a vector option

#42 hypnopump closed 1 month ago
2
Beta vec

#41 hypnopump closed 1 month ago
0
Minor mamba-2 fixes

#40 DanFosing closed 1 month ago
0
Add implementations of Mamba 2 into FLA

#39 DanFosing closed 1 month ago
19
RuntimeError: Triton Error [CUDA]: device-side assert triggered for fla.modules.layernorm.py

#38 K-H-Ismail closed 2 months ago
5
fix: enhance state gradient when bf16

#37 uniartisan closed 2 months ago
1
High precision and gradient discrepancy in RWKV Triton implementation between chunk and recurrent_fuse

#36 uniartisan closed 2 months ago
2
fix: calculate du on different batch

#35 uniartisan closed 2 months ago
1
Add implementations of Mamba 2 into FLA

#34 DanFosing closed 1 month ago
7
Lack of speed advantage in GLA training

#33 Yingyue-L closed 1 month ago
4
benchmark_training_throughput and bugs

#32 rakkit closed 2 months ago
5
Quick question: Is there a non-causal optimized form of Flash Linear Attention?

#31 yzeng58 closed 1 month ago
7
training efficiency of GLA

#30 pengzhangzhi closed 2 months ago
11
Current FLA RWKV6 implementation has significant precision issues in pure bf16 mode

#29 howard-hou closed 1 month ago
3
bugs in BasedLinearAttention/LinearAttention/HGRN2Attention Implementation

#28 rakkit closed 3 months ago
4
Transformer model not learning after adding a classification head

#27 OREYR closed 1 month ago
13
Use Cache with GLA model raised error

#26 OREYR closed 3 months ago
6
dtype error when implementing modeling_transformer.py

#25 OREYR closed 3 months ago
1
IS there any speed(train and inference) and memory benchmarking comparison between GLA and Mamba?

#24 dongzhuoyao closed 4 months ago
1
missing hidden_size in linear attention

#23 yxchng closed 4 months ago
1
AssertionError('All values in both first input shape ([constexpr[16], constexpr[8]]) and second input shape ([constexpr[8], constexpr[16]]) must be >= 16!')

#22 yxchng closed 4 months ago
17
Does the wkv6 operator now support parallel inference?

#21 JL-er closed 3 months ago
2
inconsistent results when "masking" gating term between "fused_recurrent" and "fused_chunk" (fused_chunk presumably wrong)

#20 theodorblackbird closed 3 months ago
1
更新后的rwkv6，loss会nan

#19 JL-er closed 4 months ago
16