kijai / ComfyUI-MochiWrapper

Apache License 2.0
272 stars 15 forks source link

MochiSampler No available kernel. Aborting execution. #8

Open al3dv2 opened 2 days ago

al3dv2 commented 2 days ago

I have this error with the mochi sampler node :

ComfyUI Error Report

Error Details


## System Information
- **ComfyUI Version:** v0.2.3-1-g1b80895
- **Arguments:** ComfyUI\main.py --windows-standalone-build
- **OS:** nt
- **Python Version:** 3.11.9 (tags/v3.11.9:de54cf5, Apr  2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]
- **Embedded Python:** true
- **PyTorch Version:** 2.4.1+cu124
## Devices

- **Name:** cuda:0 Quadro RTX 6000 : cudaMallocAsync
  - **Type:** cuda
  - **VRAM Total:** 25769476096
  - **VRAM Free:** 12843330180
  - **Torch VRAM Total:** 11609833472
  - **Torch VRAM Free:** 43362948
al3dv2 commented 2 days ago

Apparently adding MATH_KERNEL_ON can solve this bug ? https://github.com/facebookresearch/sam2/issues/48

Llark2008 commented 2 days ago

I have the same problem with python 3.12.7 and pytorch 2.5.0: OS: nt Python Version: 3.12.7 (tags/v3.12.7:0b05ead, Oct 1 2024, 03:06:41) [MSC v.1941 64 bit (AMD64)] Embedded Python: true Pytorch Version: 2.5.0+cu124 Arguments: ComfyUI\main.py --windows-standalone-build RAM Total: 63.91 GB RAM Free: 45.28 GB

tdrminglin commented 2 days ago

I have the same problem with python 3.12.7 and pytorch 2.5.0, tried both sageattention and flash_attn,same results. my GPU is RTX2080TI , the memory is 22G Googled this error, looks like my GPU is too old for some algorithm

kijai commented 2 days ago

Can everyone having the issue share what GPU they are running?

mikeyimer commented 2 days ago

Can everyone having the issue share what GPU they are running?

I tried two different GPUs,in the same environment and comfyui, inference on 4090 is successful, and same error is reported on v100 32g

kijai commented 2 days ago

I can't really test this but I pushed an update that may help.

Maoweicao commented 2 days ago

The same problem,My video card also is RTX2080Ti 22G

This is error report:

ComfyUI Error Report

Error Details

## System Information
- **ComfyUI Version:** v0.2.4-5-gaf8cf79
- **Arguments:** F:\ComfyUI-webui\main.py --auto-launch --preview-method auto --disable-cuda-malloc
- **OS:** nt
- **Python Version:** 3.11.6 | packaged by conda-forge | (main, Oct  3 2023, 10:29:11) [MSC v.1935 64 bit (AMD64)]
- **Embedded Python:** false
- **PyTorch Version:** 2.4.1+cu124
## Devices

- **Name:** cuda:0 NVIDIA GeForce RTX 2080 Ti : native
  - **Type:** cuda
  - **VRAM Total:** 23621861376
  - **VRAM Free:** 10637861376
  - **Torch VRAM Total:** 11714691072
  - **Torch VRAM Free:** 43049472

## Logs

2024-10-24 20:56:16,901 - root - INFO - Total VRAM 22528 MB, total RAM 130925 MB 2024-10-24 20:56:16,901 - root - INFO - pytorch version: 2.4.1+cu124 2024-10-24 20:56:18,765 - root - INFO - xformers version: 0.0.28.post1 2024-10-24 20:56:18,770 - root - INFO - Set vram state to: NORMAL_VRAM 2024-10-24 20:56:18,771 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 2080 Ti : native 2024-10-24 20:56:19,213 - root - INFO - Using xformers cross attention 2024-10-24 20:56:20,742 - root - INFO - [Prompt Server] web root: F:\ComfyUI-webui\web 2024-10-24 20:56:20,745 - root - INFO - Adding extra search path checkpoints F:\stable-diffusion-webui_23-04-18\models/Stable-diffusion 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path configs F:\stable-diffusion-webui_23-04-18\models/Stable-diffusion 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path vae F:\stable-diffusion-webui_23-04-18\models/VAE 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path loras F:\stable-diffusion-webui_23-04-18\models/Lora 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path loras F:\stable-diffusion-webui_23-04-18\models/LyCORIS 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path upscale_models F:\stable-diffusion-webui_23-04-18\models/ESRGAN 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path upscale_models F:\stable-diffusion-webui_23-04-18\models/RealESRGAN 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path upscale_models F:\stable-diffusion-webui_23-04-18\models/SwinIR 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path embeddings F:\stable-diffusion-webui_23-04-18\embeddings 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path hypernetworks F:\stable-diffusion-webui_23-04-18\models/hypernetworks 2024-10-24 20:56:20,746 - root - INFO - Adding extra search path controlnet F:\stable-diffusion-webui_23-04-18\models/ControlNet 2024-10-24 20:56:22,757 - root - INFO - Total VRAM 22528 MB, total RAM 130925 MB 2024-10-24 20:56:22,757 - root - INFO - pytorch version: 2.4.1+cu124 2024-10-24 20:56:22,758 - root - INFO - xformers version: 0.0.28.post1 2024-10-24 20:56:22,758 - root - INFO - Set vram state to: NORMAL_VRAM 2024-10-24 20:56:22,758 - root - INFO - Device: cuda:0 NVIDIA GeForce RTX 2080 Ti : native 2024-10-24 20:56:25,888 - root - INFO - -------------- 2024-10-24 20:56:25,888 - root - INFO -  ### Mixlab Nodes: Loaded 2024-10-24 20:56:25,892 - root - INFO - ChatGPT.available False 2024-10-24 20:56:25,892 - root - INFO - editmask.available True 2024-10-24 20:56:25,901 - root - INFO - LaMaInpainting.available True 2024-10-24 20:56:26,487 - root - INFO - ClipInterrogator.available True 2024-10-24 20:56:26,760 - root - INFO - PromptGenerate.available True 2024-10-24 20:56:26,760 - root - INFO - ChinesePrompt.available True 2024-10-24 20:56:26,760 - root - INFO - RembgNode.available True 2024-10-24 20:56:27,767 - root - INFO - TripoSR.available 2024-10-24 20:56:27,768 - root - INFO - MiniCPMNode.available 2024-10-24 20:56:27,771 - root - INFO - Scenedetect.available False 2024-10-24 20:56:27,779 - root - INFO - FishSpeech.available False 2024-10-24 20:56:27,782 - root - INFO - SenseVoice.available False 2024-10-24 20:56:27,784 - root - INFO - Whisper.available False 2024-10-24 20:56:27,802 - root - INFO - FalVideo.available 2024-10-24 20:56:27,803 - root - INFO -  --------------  2024-10-24 20:56:28,950 - root - INFO - Import times for custom nodes: 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\websocket_image_save.py 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\AIGODLIKE-COMFYUI-TRANSLATION 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI_ADV_CLIP_emb 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\stability-ComfyUI-nodes 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\Comfyui-StableSR 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI_essentials 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\Derfuu_ComfyUI_ModdedNodes 2024-10-24 20:56:28,950 - root - INFO - 0.0 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper 2024-10-24 20:56:28,950 - root - INFO - 0.1 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI-KJNodes 2024-10-24 20:56:28,950 - root - INFO - 0.2 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI-Crystools 2024-10-24 20:56:28,951 - root - INFO - 0.4 seconds: F:\ComfyUI-webui\custom_nodes\comfyui_controlnet_aux 2024-10-24 20:56:28,951 - root - INFO - 0.6 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI-VideoHelperSuite 2024-10-24 20:56:28,951 - root - INFO - 0.7 seconds: F:\ComfyUI-webui\custom_nodes\ComfyUI-Manager 2024-10-24 20:56:28,951 - root - INFO - 4.2 seconds: F:\ComfyUI-webui\custom_nodes\comfyui-mixlab-nodes 2024-10-24 20:56:28,951 - root - INFO - 2024-10-24 20:56:28,966 - root - INFO - Starting server

2024-10-24 20:56:28,967 - root - INFO - To see the GUI go to: http://127.0.0.1:8188 2024-10-24 20:56:33,295 - root - INFO - got prompt 2024-10-24 20:56:58,212 - root - ERROR - !!! Exception during processing !!! No available kernel. Aborting execution. 2024-10-24 20:56:58,217 - root - ERROR - Traceback (most recent call last): File "F:\ComfyUI-webui\execution.py", line 323, in execute output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\execution.py", line 198, in get_output_data return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\execution.py", line 169, in _map_node_over_list process_inputs(input_dict, i) File "F:\ComfyUI-webui\execution.py", line 158, in process_inputs results.append(getattr(obj, func)(inputs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\nodes.py", line 237, in process latents = model.run(args, stream_results=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui.ext\Lib\site-packages\torch\utils_contextlib.py", line 116, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\t2v_synth_mochi.py", line 310, in run pred, output_cond = model_fn( ^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\t2v_synth_mochi.py", line 298, in model_fn out_cond = self.dit(z, sigma, sample) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui.ext\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui.ext\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\dit\joint_model\asymm_models_joint.py", line 646, in forward x, c, y_feat, rope_cos, rope_sin = self.prepare( ^^^^^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\dit\joint_model\asymm_models_joint.py", line 611, in prepare t5_y_pool = self.t5_y_embedder(t5_feat, t5_mask) # (B, D) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui.ext\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui.ext\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl return forward_call(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\ComfyUI-webui\custom_nodes\ComfyUI-MochiWrapper\mochi_preview\dit\joint_model\utils.py", line 90, in forward x = F.scaled_dot_product_attention( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: No available kernel. Aborting execution.

2024-10-24 20:56:58,220 - root - INFO - Prompt executed in 24.86 seconds

## Attached Workflow
Please make sure that workflow does not contain any sensitive information such as API keys or passwords.

{"last_node_id":12,"last_link_id":15,"nodes":[{"id":1,"type":"MochiTextEncode","pos":{"0":484,"1":258},"size":{"0":413.45361328125,"1":268.5947265625},"flags":{},"order":3,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":1,"label":"clip"}],"outputs":[{"name":"conditioning","type":"CONDITIONING","links":[7],"slot_index":0,"label":"conditioning"}],"properties":{"Node name for S&R":"MochiTextEncode"},"widgets_values":["nature video of a red panda eating bamboo in front of a waterfall",1,true]},{"id":8,"type":"MochiTextEncode","pos":{"0":481,"1":577},"size":{"0":400,"1":200},"flags":{},"order":4,"mode":0,"inputs":[{"name":"clip","type":"CLIP","link":8,"label":"clip"}],"outputs":[{"name":"conditioning","type":"CONDITIONING","links":[9],"slot_index":0,"label":"conditioning"}],"properties":{"Node name for S&R":"MochiTextEncode"},"widgets_values":["",1,true]},{"id":5,"type":"MochiSampler","pos":{"0":960,"1":243},"size":{"0":315,"1":242},"flags":{},"order":5,"mode":0,"inputs":[{"name":"model","type":"MOCHIMODEL","link":3,"label":"model"},{"name":"positive","type":"CONDITIONING","link":7,"label":"positive"},{"name":"negative","type":"CONDITIONING","link":9,"label":"negative"}],"outputs":[{"name":"model","type":"LATENT","links":[12],"slot_index":0,"label":"model"}],"properties":{"Node name for S&R":"MochiSampler"},"widgets_values":[848,480,163,50,4.5,0,"fixed"]},{"id":10,"type":"MochiDecode","pos":{"0":1306,"1":158},"size":{"0":315,"1":222},"flags":{},"order":6,"mode":0,"inputs":[{"name":"vae","type":"MOCHIVAE","link":11,"label":"vae"},{"name":"samples","type":"LATENT","link":12,"label":"samples"}],"outputs":[{"name":"images","type":"IMAGE","links":[14],"slot_index":0,"label":"images"}],"properties":{"Node name for S&R":"MochiDecode"},"widgets_values":[true,false,10,160,312,0.25,0.25]},{"id":11,"type":"GetImageSizeAndCount","pos":{"0":1385,"1":441},"size":{"0":222.00714111328125,"1":86},"flags":{},"order":7,"mode":0,"inputs":[{"name":"image","type":"IMAGE","link":14,"label":"图像"}],"outputs":[{"name":"image","type":"IMAGE","links":[15],"slot_index":0,"label":"图像"},{"name":"width","type":"INT","links":null,"label":"宽度"},{"name":"height","type":"INT","links":null,"label":"高度"},{"name":"count","type":"INT","links":null,"label":"数量"}],"properties":{"Node name for S&R":"GetImageSizeAndCount"},"widgets_values":[]},{"id":9,"type":"VHS_VideoCombine","pos":{"0":1683,"1":63},"size":[1261.0787353515625,310],"flags":{},"order":8,"mode":0,"inputs":[{"name":"images","type":"IMAGE","link":15,"label":"图像"},{"name":"audio","type":"AUDIO","link":null,"shape":7,"label":"音频"},{"name":"meta_batch","type":"VHS_BatchManager","link":null,"shape":7,"label":"批次管理"},{"name":"vae","type":"VAE","link":null,"shape":7}],"outputs":[{"name":"Filenames","type":"VHS_FILENAMES","links":null,"label":"文件名"}],"properties":{"Node name for S&R":"VHS_VideoCombine"},"widgets_values":{"frame_rate":24,"loop_count":0,"filename_prefix":"Mochi_preview","format":"video/h264-mp4","pix_fmt":"yuv420p","crf":19,"save_metadata":true,"pingpong":false,"save_output":false,"videopreview":{"hidden":false,"paused":false,"params":{"filename":"Mochi_preview_00021.mp4","subfolder":"","type":"temp","format":"video/h264-mp4","frame_rate":24},"muted":false}}},{"id":12,"type":"Note","pos":{"0":1271,"1":-119},"size":{"0":365.5867919921875,"1":208.3488311767578},"flags":{},"order":0,"mode":0,"inputs":[],"outputs":[],"properties":{},"widgets_values":["VAE decoding is extremely heavy so tiling is necessary, I have not found best settings for it yet so testing help is appreciated, you can keep decoding after sampling as the latents are still in memory to see what works.\n\nEither adjust frame_batch_size to decode less frames at once, this tends to cause frame skipping though.\n\nOr use higher batch and smaller tiles to still fit it in memory."],"color":"#432","bgcolor":"#653"},{"id":2,"type":"CLIPLoader","pos":{"0":-3,"1":462},"size":{"0":429.837646484375,"1":82},"flags":{},"order":1,"mode":0,"inputs":[],"outputs":[{"name":"CLIP","type":"CLIP","links":[1,8],"label":"CLIP"}],"properties":{"Node name for S&R":"CLIPLoader"},"widgets_values":["t5xxl_fp8_e4m3fn.safetensors","stable_diffusion"]},{"id":4,"type":"DownloadAndLoadMochiModel","pos":{"0":393,"1":59},"size":{"0":437.7432556152344,"1":150},"flags":{},"order":2,"mode":0,"inputs":[],"outputs":[{"name":"mochi_model","type":"MOCHIMODEL","links":[3],"slot_index":0,"label":"mochi_model"},{"name":"mochi_vae","type":"MOCHIVAE","links":[11],"slot_index":1,"label":"mochi_vae"}],"properties":{"Node name for S&R":"DownloadAndLoadMochiModel"},"widgets_values":["mochi_preview_dit_fp8_e4m3fn.safetensors","mochi_preview_vae_bf16.safetensors","fp8_e4m3fn","sage_attn"]}],"links":[[1,2,0,1,0,"CLIP"],[3,4,0,5,0,"MOCHIMODEL"],[7,1,0,5,1,"CONDITIONING"],[8,2,0,8,0,"CLIP"],[9,8,0,5,2,"CONDITIONING"],[11,4,1,10,0,"MOCHIVAE"],[12,5,0,10,1,"LATENT"],[14,10,0,11,0,"IMAGE"],[15,11,0,9,0,"IMAGE"]],"groups":[],"config":{},"extra":{"ds":{"scale":0.9646149645000006,"offset":[159.60632227346855,137.3169382414236]}},"version":0.4}



## Additional Context
(Please add any additional context or steps to reproduce the error here)

But I run this custom node in a all-in-one runtime.this is its diagnostics logs,custom node version is d699fae
[Diagnostics-1729774769.log](https://github.com/user-attachments/files/17507188/Diagnostics-1729774769.log)
gnrsbassoutlook commented 2 days ago

NVIDIA GeForce RTX 2080 Ti, 22G, the same problem, the same GPU

tdrminglin commented 2 days ago

I added "with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True): "before every "scaled_dot_product_attention" I also found that on my computer(2080ti GPU), the "flash_sttn_varlen_qkvpacked_func" and "sageattn" are both not imported when comfyui starts up.(flash attention and sage attention have been installed) So after the above work I choose "sdpa" mode in cmfyui, now the "no available kernel " error is gone, but I instantly get an "OOM" error while my vram still have 11G free . After carefully observation, I found that the vram usage was 22G for a second when the workflow goes to Ksampler . maybe it's just impossible for 2080ti to run this

msola-ht commented 1 day ago

Same issue as 2080TI

tdrminglin commented 1 day ago

Guys,I've finally made it run on 2080ti 22G, I think the two methods made the difference:

  1. I added "with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True): "before every "scaled_dot_product_attention"
  2. choose "sdpa" mode in cmfyui.
  3. As for the OOM error I mentioned above, turnning down the video size to 384 will fix it
  4. the Vram goes to 18.4G max.
kijai commented 1 day ago

Guys,I've finally made it run on 2080ti 22G, I think the two methods made the difference:

  1. I added "with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True): "before every "scaled_dot_product_attention"
  2. choose "sdpa" mode in cmfyui.
  3. As for the OOM error I mentioned above, turnning down the video size to 384 will fix it
  4. the Vram goes to 18.4G max.

I just added a GGUF model earlier, maybe that will also help.

littleyeson commented 19 hours ago

Guys,I've finally made it run on 2080ti 22G, I think the two methods made the difference:

  1. I added "with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True): "before every "scaled_dot_product_attention"
  2. choose "sdpa" mode in cmfyui.
  3. As for the OOM error I mentioned above, turnning down the video size to 384 will fix it
  4. the Vram goes to 18.4G max.

How to add? I 2080ti 22G

ptits commented 4 hours ago

RuntimeError: No available kernel. Aborting execution.

H100 80GB

NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Aug_14_10:10:22_PDT_2024 Cuda compilation tools, release 12.6, V12.6.68 Build cuda_12.6.r12.6/compiler.34714021_0

shot_241026_131319

huanxve commented 1 hour ago

我在每个“scaled_dot_product_attention”之前添加了“with torch.backends.cuda.sdp_kernel(enable_flash=True,enable_math=True,enable_mem_efficient=True):” 我还发现在我的计算机(2080ti GPU)上,“flash_sttn_varlen_qkvpacked_func”和“sageattn” ” 在 comfyui 启动时都没有导入。(已经安装了 Flash Attention 和 Sage Attention) 所以经过上述工作后,我在 cmfyui 中选择“sdpa”模式,现在“没有可用内核”错误消失了,但我立即得到一个“OOM”错误,而我的 vram 仍有 11G 可用空间。 经过仔细观察,我发现当工作流程转到 Ksampler 时,vram 使用量有一秒钟 22G。 也许2080ti根本不可能运行这个

Which file should be modified? How to fill in the code?

kijai commented 48 minutes ago

RuntimeError: No available kernel. Aborting execution.

H100 80GB

NVIDIA-SMI 535.104.12 Driver Version: 535.104.12 CUDA Version: 12.2

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Aug_14_10:10:22_PDT_2024 Cuda compilation tools, release 12.6, V12.6.68 Build cuda_12.6.r12.6/compiler.34714021_0

shot_241026_131319

I haven't been able to try on H100 yet, but this could be because I attempted to add the new sdpa cudnn kernel that's available in torch 2.5.0, which they say is twice as fast. It may need newer drivers, 535 is very old by now.

tdrminglin commented 14 minutes ago

scaled_dot_product_attention 1.around line 90 in utils.py 2.line190 in asymm_models_joint.py is changed to: with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True):

with torch.autocast("cuda", enabled=True):

with sdpa_kernel(backends):###(backends)

out = F.scaled_dot_product_attention( 3.line406 in attention.py as far as I can remember, I changed these lines