sustcsonglin / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
MIT License
1.35k stars 70 forks source link

[Bug]: Call RWKV6Attention and report an error environment. #74

Open synbol opened 3 weeks ago

synbol commented 3 weeks ago

Describe the bug

Error message: python: /project/lib/Analysis/Allocation.cpp:47: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.

Steps to reproduce the bug

Calling process: from fla.layers.rwkv6 import RWKV6Attention self.attention = RWKV6Attention(hidden_size=config.dim, num_heads=config.n_head)

o, _, past_key_values = self.attention(self.attention_norm(x), attention_mask=mask, past_key_values=past_key_values)

Expected behavior

None

Environment info

Environment: torch 2.4.1 triton 3.0.0 einops 0.8.0

synbol commented 3 weeks ago

self.attention = RWKV6Attention(hidden_size=config.dim, num_heads=config.n_head, layer_idx=layer_id)

yzhangcs commented 3 weeks ago

Hi, can you provide some runnable code for reproduction. It works normally for me by running this

python benchmark_training_throughput.py --name rwkv6
sustcsonglin commented 3 weeks ago

Hi, thanks for reporting it. What is your GPU model?