thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
BSD 3-Clause "New" or "Revised" License
391 stars 16 forks source link

遇到些兼容性问题 #14

Open otoTree opened 3 weeks ago

otoTree commented 3 weeks ago

错误: Traceback (most recent call last): File "/root/musc/examples/musc_main_clone.py", line 90, in model.main() File "/root/musc/models/musc_clone.py", line 237, in main self.make_category_data(category="category", ) File "/root/musc/models/musc_clone.py", line 142, in make_category_data patch_tokens = self.dino_model.get_intermediate_layers(x=input_image, File "/root/musc/./models/backbone/dinov2/models/vision_transformer.py", line 311, in get_intermediate_layers outputs = self._get_intermediate_layers_not_chunked(x, n) File "/root/musc/./models/backbone/dinov2/models/vision_transformer.py", line 280, in _get_intermediate_layers_not_chunked x = blk(x) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/root/musc/./models/backbone/dinov2/layers/block.py", line 254, in forward return super().forward(x_or_x_list) File "/root/musc/./models/backbone/dinov2/layers/block.py", line 112, in forward x = x + attn_residual_func(x) File "/root/musc/./models/backbone/dinov2/layers/block.py", line 91, in attn_residual_func return self.ls1(self.attn(self.norm1(x))) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/root/miniconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/musc/./models/backbone/dinov2/layers/attention.py", line 79, in forward return super().forward(x) File "/root/musc/./models/backbone/dinov2/layers/attention.py", line 63, in forward attn = sageattn(q, k, v, is_causal=False, smooth_k=True) File "/root/miniconda3/lib/python3.10/site-packages/sageattention/core.py", line 45, in sageattn o = attn_h64_false(q_int8, k_int8, v, q_scale, k_scale) File "/root/miniconda3/lib/python3.10/site-packages/sageattention/attn_qk_int8_per_block_h64.py", line 97, in forward _attn_fwd[grid]( File "", line 63, in _attn_fwd File "/root/miniconda3/lib/python3.10/site-packages/triton/compiler/compiler.py", line 476, in compile next_module = compile_kernel(module) File "/root/miniconda3/lib/python3.10/site-packages/triton/compiler/compiler.py", line 383, in lambda src: optimize_ttgir(ttir_to_ttgir(src, num_warps), num_stages, arch)) File "/root/miniconda3/lib/python3.10/site-packages/triton/compiler/compiler.py", line 91, in optimize_ttgir pm.run(mod) RuntimeError: PassManager::run failed

显卡:3090 root@autodl-container-12f34dabc5-6cbd4932:~/musc# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

jt-zhang commented 3 weeks ago

Thank you for reaching out. Please first check the dtype and shape of q, k, v, and ensure triton>=2.3.0 and torch>=2.3.0 is installed.

windring commented 6 days ago

same problem

jason-huang03 commented 4 days ago

@windring what is your platform, torch version and triton version?

windring commented 4 days ago

@jason-huang03 torch, and I found that my model is fp32

jason-huang03 commented 2 days ago

Hi can you try the latest code?

On Tue, Nov 12, 2024 at 6:56 PM gomoku @.***> wrote:

@jason-huang03 https://github.com/jason-huang03 I'm sorry to bother you again. When I fix dtype, it told me PassManager::run failed again

.conda/lib/python3.10/site-packages/sageattention/attn_qk_int8_per_block_hd64_causal.py":98:63)): error: mismatching kWidth between A and B operands

image.png (view on web) https://github.com/user-attachments/assets/c4163914-e8db-41f2-a7e5-2a055a850212 image.png (view on web) https://github.com/user-attachments/assets/d9bbe52e-82cc-411a-a8c5-c7f46cbcd885

— Reply to this email directly, view it on GitHub https://github.com/thu-ml/SageAttention/issues/14#issuecomment-2470219277, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2JDQMP5F3XLR46JDQRFQKT2AHNHRAVCNFSM6AAAAABQJG36CSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZQGIYTSMRXG4 . You are receiving this because you were mentioned.Message ID: @.***>