lshqqytiger / ZLUDA

CUDA on AMD GPUs
Apache License 2.0
171 stars 4 forks source link

is Xformers with ZLUDA possible? #23

Open unclemusclez opened 2 weeks ago

unclemusclez commented 2 weeks ago

i compiled ZLUDA Finished `release` profile [optimized] target(s) in 5m 40s i dowloaded nccl from NVIDIA and placed it inside of the ZLUDA directory P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64

with pytorch-build.bat:

@echo off

set TORCH_CUDA_ARCH_LIST="6.1+PTX"
set CUDAARCHS="61"
set CMAKE_CUDA_ARCHITECTURES="61"
set USE_SYSTEM_NCCL=1
set NCCL_ROOT_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64"
set NCCL_INCLUDE_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\include"
set NCCL_LIB_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\lib"
set USE_EXPERIMENTAL_CUDNN_V8_API=1
@echo enviornment set

cargo clean
cargo xtask --release

@pause

is it possible with this configuration to set torch.backends.cudnn.enabled = True ?

the error i get with torch.backends.cudnn.enabled = True. perhaps it is unrelated, but i am just trying to allow for xformers to function.


got prompt
[rgthree] Using rgthree's optimized recursive execution.
[rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute.
[rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI.
model_type FLOW
Using xformers attention in VAE
Using xformers attention in VAE
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
clip missing: ['text_projection.weight']
Requested to load SD3ClipModel
Loading 1 new model
Requested to load SD3
Loading 1 new model
  0%|                                                                                                                                                                                                                | 0/28 [00:02<?, ?it/s]
!!! Exception during processing!!! CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "P:\ComfyUI-ZLUDA\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-0246\utils.py", line 381, in new_func
    res_value = old_func(*final_args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\nodes.py", line 1371, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\nodes.py", line 1341, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 22, in informative_sample
    raise e
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\sample_error_enhancer.py", line 9, in informative_sample
    return original_sample(*args, **kwargs)  # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\sampling.py", line 313, in motion_sample
    return orig_comfy_sample(model, noise, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\sample.py", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 794, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 696, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 683, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 662, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 567, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\k_diffusion\sampling.py", line 189, in sample_heun
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 291, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 649, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 652, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 277, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\samplers.py", line 226, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\model_base.py", line 113, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 961, in forward
    return super().forward(x, timesteps, context=context, y=y)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 946, in forward
    x = self.forward_core_with_concat(x, c, context)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 909, in forward_core_with_concat
    context, x = block(
                 ^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 635, in forward
    return block_mixing(
           ^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 589, in block_mixing
    return _block_mixing(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 602, in _block_mixing
    attn = optimized_attention(
           ^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\mmdit.py", line 293, in optimized_attention
    return attention.optimized_attention(qkv[0], qkv[1], qkv[2], num_heads)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\attention.py", line 380, in attention_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 268, in memory_efficient_attention
    return _memory_efficient_attention(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 387, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\__init__.py", line 407, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 202, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\cutlass.py", line 266, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
                                           ^^^^^^^^^^^^^
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.```
lshqqytiger commented 1 week ago

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2. https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-10-21-01-WSL-2.html

unclemusclez commented 1 week ago

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2. https://www.amd.com/en/resources/support-articles/release-notes/RN-RAD-WIN-24-10-21-01-WSL-2.html

im trying it now.... when did this come out?

lshqqytiger commented 1 week ago

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

unclemusclez commented 1 week ago

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

yes, 7900xt

unclemusclez commented 1 week ago

So i've been testing the ROCm driver for WSL.

There are sill use-cases for ZLUDA with PyTorch, particularly pertaining to https://github.com/hpcaitech/Open-Sora. seems to need CUDA.

i find ROCm is about 2-3x faster than ZLUDA with Pytorch