lshqqytiger / ZLUDA

Apache License 2.0
171 stars 4 forks source link

is Xformers with ZLUDA possible? #23

Open unclemusclez opened 2 weeks ago

unclemusclez commented 2 weeks ago

i compiled ZLUDA Finished `release` profile [optimized] target(s) in 5m 40s i dowloaded nccl from NVIDIA and placed it inside of the ZLUDA directory P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64

with pytorch-build.bat:

@echo off

set CUDAARCHS="61"
set NCCL_ROOT_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64"
set NCCL_INCLUDE_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\include"
set NCCL_LIB_DIR="P:\gitrepos\ZLUDA\nccl_2.21.5-1+cuda11.0_x86_64\lib"
@echo enviornment set

cargo clean
cargo xtask --release


is it possible with this configuration to set torch.backends.cudnn.enabled = True ?

the error i get with torch.backends.cudnn.enabled = True. perhaps it is unrelated, but i am just trying to allow for xformers to function.

got prompt
[rgthree] Using rgthree's optimized recursive execution.
[rgthree] First run patching recursive_output_delete_if_changed and recursive_will_execute.
[rgthree] Note: If execution seems broken due to forward ComfyUI changes, you can disable the optimization from rgthree settings in ComfyUI.
model_type FLOW
Using xformers attention in VAE
Using xformers attention in VAE
no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
clip missing: ['text_projection.weight']
Requested to load SD3ClipModel
Loading 1 new model
Requested to load SD3
Loading 1 new model
  0%|                                                                                                                                                                                                                | 0/28 [00:02<?, ?it/s]
!!! Exception during processing!!! CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "P:\ComfyUI-ZLUDA\", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "P:\ComfyUI-ZLUDA\", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-0246\", line 381, in new_func
    res_value = old_func(*final_args, **kwargs)
  File "P:\ComfyUI-ZLUDA\", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "P:\ComfyUI-ZLUDA\", line 1371, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
  File "P:\ComfyUI-ZLUDA\", line 1341, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\", line 22, in informative_sample
    raise e
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-Impact-Pack\modules\impact\", line 9, in informative_sample
    return original_sample(*args, **kwargs)  # This code helps interpret error messages that occur within exceptions but does not have any impact on other operations.
  File "P:\ComfyUI-ZLUDA\custom_nodes\ComfyUI-AnimateDiff-Evolved\animatediff\", line 313, in motion_sample
    return orig_comfy_sample(model, noise, *args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\", line 43, in sample
    samples = sampler.sample(noise, positive, negative, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 794, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 696, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 683, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 662, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
  File "P:\ComfyUI-ZLUDA\comfy\", line 567, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\utils\", line 115, in decorate_context
    return func(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\k_diffusion\", line 189, in sample_heun
    denoised = model(x, sigma_hat * s_in, **extra_args)
  File "P:\ComfyUI-ZLUDA\comfy\", line 291, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 649, in __call__
    return self.predict_noise(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\", line 652, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
  File "P:\ComfyUI-ZLUDA\comfy\", line 277, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
  File "P:\ComfyUI-ZLUDA\comfy\", line 226, in calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
  File "P:\ComfyUI-ZLUDA\comfy\", line 113, in apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 961, in forward
    return super().forward(x, timesteps, context=context, y=y)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 946, in forward
    x = self.forward_core_with_concat(x, c, context)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 909, in forward_core_with_concat
    context, x = block(
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\nn\modules\", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 635, in forward
    return block_mixing(
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 589, in block_mixing
    return _block_mixing(*args, **kwargs)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 602, in _block_mixing
    attn = optimized_attention(
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\diffusionmodules\", line 293, in optimized_attention
    return attention.optimized_attention(qkv[0], qkv[1], qkv[2], num_heads)
  File "P:\ComfyUI-ZLUDA\comfy\ldm\modules\", line 380, in attention_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=mask)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\", line 268, in memory_efficient_attention
    return _memory_efficient_attention(
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\", line 387, in _memory_efficient_attention
    return _memory_efficient_attention_forward(
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\", line 407, in _memory_efficient_attention_forward
    out, *_ = op.apply(inp, needs_gradient=False)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\", line 202, in apply
    return cls.apply_bmhk(inp, needs_gradient=needs_gradient)
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\xformers\ops\fmha\", line 266, in apply_bmhk
    out, lse, rng_seed, rng_offset, _, _ = cls.OPERATOR(
  File "P:\ComfyUI-ZLUDA\.venv\Lib\site-packages\torch\", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
RuntimeError: CUDA error: named symbol not found
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.```
lshqqytiger commented 1 week ago

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2.

unclemusclez commented 1 week ago

Do you just need comfyui to work? If so, try WSL with ROCm. It supports Flash Attention 2.

im trying it now.... when did this come out?

lshqqytiger commented 1 week ago

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

unclemusclez commented 1 week ago

Very recently. Are you on gfx1100? (RX 7900 XT(X), GRE, etc)

yes, 7900xt

unclemusclez commented 1 week ago

So i've been testing the ROCm driver for WSL.

There are sill use-cases for ZLUDA with PyTorch, particularly pertaining to seems to need CUDA.

i find ROCm is about 2-3x faster than ZLUDA with Pytorch