[Bug]: Unable to load models with ZLUDA (Win10 RX580)

knrh8r commented 5 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

Windows 10, AMD RX 580, ZLUDA Installed the latest version of webui, no extensions. Webui launches fine, but image generation does not work, it's unable to load any checkpoints.

Steps to reproduce the problem

Here's what I did • Remove any other python versions, install v3.10.11 • Install AMD HIP SDK 5.7.1 • Download and unzip ZLUDA files • Add HIP and ZLUDA folders to PATH • Replace the ROCm\5.7\bin\rocblas\library folder with the one from ROCmLibs archive • Git clone webui-amdgpu repo • Add --use-zluda --medvram-sdxl --update-check --skip-ort to webui-user.bat • Put SD-v1.5 safetensor in models folder • Run webui-user.bat

What should have happened?

Didn't have any issues with DirectML, expected new version with ZLUDA support to work on my GPU

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-06-24-17-20.json

Console logs

Creating venv in directory C:\Users\k\stable-diffusion-webui-amdgpu\venv using python "C:\Users\k\AppData\Local\Programs\Python\Python310\python.exe"
venv "C:\Users\k\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.9.3-amd-26-g50d3cf78
Commit hash: 50d3cf7852cfe07bd562440246202d8925be98a4
Installing torch and torchvision
Looking in indexes: https://download.pytorch.org/whl/cu118
(downloads listing here)
[notice] A new release of pip is available: 23.0.1 -> 24.1
[notice] To update, run: C:\Users\k\stable-diffusion-webui-amdgpu\venv\Scripts\python.exe -m pip install --upgrade pip
Using ZLUDA in C:\Users\k\stable-diffusion-webui-amdgpu\.zluda
Installing clip
Installing open_clip
Cloning assets into C:\Users\k\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-webui-assets...
Cloning into '/home/k/stable-diffusion-webui-amdgpu/repositories/stable-diffusion-webui-assets'...
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (20/20), done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 20 (delta 0), reused 20 (delta 0), pack-reused 0
Receiving objects: 100% (20/20), 132.70 KiB | 2.04 MiB/s, done.
Cloning Stable Diffusion into C:\Users\k\stable-diffusion-webui-amdgpu\repositories\stable-diffusion-stability-ai...
Cloning into '/home/k/stable-diffusion-webui-amdgpu/repositories/stable-diffusion-stability-ai'...
remote: Enumerating objects: 580, done.
remote: Counting objects: 100% (571/571), done.
remote: Compressing objects: 100% (304/304), done.
Receiving objects:  90% (522/580), 69.47 MiB | 7.70 MiB/sremote: Total 580 (delta 278), reused 448 (delta 249), pack-reused 9
Receiving objects: 100% (580/580), 73.44 MiB | 7.31 MiB/s, done.
Resolving deltas: 100% (278/278), done.
Cloning Stable Diffusion XL into C:\Users\k\stable-diffusion-webui-amdgpu\repositories\generative-models...
Cloning into '/home/k/stable-diffusion-webui-amdgpu/repositories/generative-models'...
remote: Enumerating objects: 941, done.
remote: Total 941 (delta 0), reused 0 (delta 0), pack-reused 941
Receiving objects: 100% (941/941), 43.85 MiB | 7.28 MiB/s, done.
Resolving deltas: 100% (491/491), done.
Cloning K-diffusion into C:\Users\k\stable-diffusion-webui-amdgpu\repositories\k-diffusion...
Cloning into '/home/k/stable-diffusion-webui-amdgpu/repositories/k-diffusion'...
remote: Enumerating objects: 1345, done.
remote: Counting objects: 100% (743/743), done.
remote: Compressing objects: 100% (94/94), done.
remote: Total 1345 (delta 697), reused 655 (delta 649), pack-reused 602
Receiving objects: 100% (1345/1345), 236.07 KiB | 787.00 KiB/s, done.
Resolving deltas: 100% (945/945), done.
Cloning BLIP into C:\Users\k\stable-diffusion-webui-amdgpu\repositories\BLIP...
Cloning into '/home/k/stable-diffusion-webui-amdgpu/repositories/BLIP'...
remote: Enumerating objects: 277, done.
remote: Counting objects: 100% (165/165), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 277 (delta 137), reused 136 (delta 135), pack-reused 112
Receiving objects: 100% (277/277), 7.03 MiB | 5.65 MiB/s, done.
Resolving deltas: 100% (152/152), done.
Installing requirements
Skipping onnxruntime installation.
You are up to date with the most recent release.
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --medvram-sdxl --update-check --skip-ort
ZLUDA device failed to pass basic operation test: index=None, device_name=Radeon RX 580 Series [ZLUDA]
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\transformers\transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.
deprecate("Transformer2DModelOutput", "1.0.0", deprecation_message)
Calculating sha256 for C:\Users\k\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned.safetensors: Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 748.2s (prepare environment: 754.6s, initialize shared: 4.4s, other imports: 2.3s, load scripts: 1.1s, create ui: 0.7s, gradio launch: 0.4s).
1a189f0be69d6106a48548e7626207dddd7042a418dbf372cefd05e0cdba61b6
Loading weights [1a189f0be6] from C:\Users\k\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned.safetensors

Creating model from config: C:\Users\k\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Applying attention optimization: InvokeAI... done.
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
    File "C:\Users\k\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
    File "C:\Users\k\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
    File "C:\Users\k\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 621, in get_sd_model
    load_model()
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 783, in load_model
    sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 659, in get_empty_cond
    return sd_model.cond_stage_model([""])
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_hijack_clip.py", line 234, in forward
    z = self.process_tokens(tokens, multipliers)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_hijack_clip.py", line 276, in process_tokens
    z = self.encode_with_transformers(tokens)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_hijack_clip.py", line 331, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 822, in forward
    return self.text_model(
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 730, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 227, in forward
    inputs_embeds = self.token_embedding(input_ids)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\modules\sd_hijack.py", line 348, in forward
    inputs_embeds = self.wrapped(input_ids)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\modules\sparse.py", line 163, in forward
    return F.embedding(
    File "C:\Users\k\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\nn\functional.py", line 2264, in embedding    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Stable diffusion model failed to load

Additional information

Have AMD Adrenalin Driver Version 24.3.1 installed (latest for me) Been using WebUI v1.7.0 with DirectML for awhile, didn't have any issues. Tried installing WebUI v1.8.0, which did load a checkpoint and generate images, however, the resulting images were just noise. (related to this? zluda issue #208)

I can verify that ZLUDA is working outside of WebUI by launching Blender3D - I can select CUDA as the rendering backend and render images using GPU.

lshqqytiger commented 4 months ago

Polaris GPUs have a bug that the driver raises out of memory error when handling system memory. This may be from gpu driver itself or zluda. Try PRO driver if possible.

knrh8r commented 4 months ago

As you suggested: • Uninstalled HIP SDK • Uninstalled current GPU Drivers using Display Driver Uninstaller • Did a clean install of the latest AMD PRO Edition drivers • Reinstalled HIP SDK

Again, tested ZLUDA by launching Blender3D - works as expected without issues.

Unfortunately, the same issue remains with WebUI. I compared the console output with the console logs provided above, they are identical, so nothing has changed. Is there a possible workaround for this? Thanks for your time.

Freda-Chan commented 4 months ago

Downgrading torch and torchvision fixed the issue for me. I have same gpu and I'm using 24.3.1 Adrenalin driver. I manually entered these commands: https://github.com/patientx/ComfyUI-Zluda/blob/master/fixforrx580.bat Same fix also works for SD Next too.

knrh8r commented 4 months ago

Downgrading torch and torchvision fixed the issue for me. I have same gpu and I'm using 24.3.1 Adrenalin driver. I manually entered these commands: https://github.com/patientx/ComfyUI-Zluda/blob/master/fixforrx580.bat

Downgrading torch with the provided commands worked for me and eventually I was able to generate images using zluda. Thank you!

lshqqytiger / stable-diffusion-webui-amdgpu