[Bug]: --use-directml doesn't work anymore on AMD, without it it runs correctly on CPU, clean installation of directml stable diffusion

Checklist

[ ] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

I re-installed directml stable diffusion from scratch and it is working correctly on CPU, and generating each image in 5min!, as soon as i add --use-directml. it can't load models anymore, the webui is loaded correctly but nothing is running

Steps to reproduce the problem

1 add --use-directml to webui user.bat 2 run webui user bat file on a clean working installation of directml stable diffusion ( works perfectly without --use-directml) 3 webui loads but models fail

What should have happened?

it should have worked via AMD GPU... it only works without --use-directml which is why we are using this version to make it work on GPU AMD rather than CPU!

No idea what i'm doing wrong here!.. it used to work perfectly before on my GPU but then an update happened few months ago that messed up everything.

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-05-26-12-58.json

Console logs

C:\Users\~\stable-diffusion-webui-directml>git pull
Already up to date.
venv "C:\Users\~\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-13-g517aaaff
Commit hash: 517aaaff2bb1a512057d88b0284193b9f23c0b47
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --medvram --use-directml --no-half --precision full --opt-sub-quad-attention --opt-split-attention-v1 --theme dark --autolaunch --disable-safe-unpickle --disable-nan-check
ONNX: version=1.18.0 provider=DmlExecutionProvider, available=['DmlExecutionProvider', 'CPUExecutionProvider']
==============================================================================
You are running torch 2.0.0+cpu.
The program is tested to work with torch 2.1.2.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
Loading weights [c2c4cba68e] from C:\Users\~\stable-diffusion-webui-directml\models\Stable-diffusion\yassking_colab1111.ckpt
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 27.1s (prepare environment: 35.1s, initialize shared: 5.6s, other imports: 0.1s, list SD models: 0.2s, load scripts: 2.2s, create ui: 0.7s, gradio launch: 0.6s).
Creating model from config: C:\Users\~\stable-diffusion-webui-directml\configs\v1-inference.yaml
C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Applying attention optimization: sub-quadratic... done.
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
  File "C:\Users\~\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\~\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\~\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "C:\Users\~\stable-diffusion-webui-directml\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_models.py", line 621, in get_sd_model
    load_model()
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_models.py", line 783, in load_model
    sd_model.cond_stage_model_empty_prompt = get_empty_cond(sd_model)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_models.py", line 659, in get_empty_cond
    return sd_model.cond_stage_model([""])
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_hijack_clip.py", line 234, in forward
    z = self.process_tokens(tokens, multipliers)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_hijack_clip.py", line 276, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\sd_hijack_clip.py", line 331, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 822, in forward
    return self.text_model(
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 730, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "C:\Users\~\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\~\stable-diffusion-webui-directml\modules\dml\hijack\transformers.py", line 39, in CLIPTextEmbeddings_forward
    embeddings = inputs_embeds + position_embeddings
RuntimeError: Unspecified error

Stable diffusion model failed to load

Additional information

No response

Hey i tested the Webui in CPU mode first by using: --use-directml --use-cpu all --no-half --opt-sub-quad-attention (CPU mode gets prefered here, but directmml is needed to get the right torch version) Then i switched to directml by removing --use-cpu all and used only these: --use-directml --no-half --opt-sub-quad-attention

Both working normaly. Also: --precision full isn't needed for directml (can be needed for CPU mode) --disable-nan-check won't fix anything and hides errors (totally not recommended) --disable-safe-unpickle is unsafe. better convert your old .ckpt models to .safetensor with the Checkpoint merger tab in the webui.

You should check if it works for you with the directml args from above. And please test with a .safetensor model that is 2gb in size like Dreamshaper v8 model.

Bro your --use-cpu all brought another unfixable error, had to re install everything again. it is running just fine on CPU with these arguments --medvram --no-half --precision full --opt-sub-quad-attention --opt-split-attention-v1 --theme dark --autolaunch --disable-safe-unpickle --disable-nan-check --skip-torch-cuda-test

on GPU i still have the failed to load model error above, when i add --use-directml

I'm running on AMD 7100 firepro 8GB... not a radeon or something more Rocm /Zluda compatible...(which i think you running it on and therefore no issues)

again few months back it was running fine with --use-directml on that same GPU... have no idea what new update made it unfixable

Win10, Python 3.10.6, Radeon RX 580 2048SP. Have a same trouble. I install stable-diffusion-webui-amdgpu for the first time, so I could make mistakes that are obvious to folk. I clone repo: git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu.git Edit webui.bat: added in the top set COMMANDLINE_ARGS= --backend directml. Run webui.bat. And received error:

```` >webui-user.bat --use-directml Creating venv in directory f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\venv using python "C:\Program Files\Python310\python.exe" venv "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe" Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: v1.9.3-amd-23-gd23a1724 Commit hash: d23a1724921e58d6ba92c40236cbf8dd139c130b Installing torch and torchvision Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121 Collecting torch==2.3.0 Downloading https://download.pytorch.org/whl/cu121/torch-2.3.0%2Bcu121-cp310-cp310-win_amd64.whl (2413.3 MB) ---------------------------------------- 2.4/2.4 GB 1.8 MB/s eta 0:00:00 Collecting torchvision Downloading https://download.pytorch.org/whl/cu121/torchvision-0.18.0%2Bcu121-cp310-cp310-win_amd64.whl (5.7 MB) ---------------------------------------- 5.7/5.7 MB 11.0 MB/s eta 0:00:00 Collecting networkx Downloading networkx-3.3-py3-none-any.whl (1.7 MB) ---------------------------------------- 1.7/1.7 MB 6.8 MB/s eta 0:00:00 Collecting jinja2 Downloading jinja2-3.1.4-py3-none-any.whl (133 kB) ---------------------------------------- 133.3/133.3 kB ? eta 0:00:00 Collecting typing-extensions>=4.8.0 Downloading typing_extensions-4.12.0-py3-none-any.whl (37 kB) Collecting sympy Downloading sympy-1.12.1-py3-none-any.whl (5.7 MB) ---------------------------------------- 5.7/5.7 MB 11.8 MB/s eta 0:00:00 Collecting filelock Downloading filelock-3.14.0-py3-none-any.whl (12 kB) Collecting fsspec Downloading fsspec-2024.5.0-py3-none-any.whl (316 kB) ---------------------------------------- 316.1/316.1 kB 9.6 MB/s eta 0:00:00 Collecting mkl<=2021.4.0,>=2021.1.1 Downloading https://download.pytorch.org/whl/mkl-2021.4.0-py2.py3-none-win_amd64.whl (228.5 MB) ---------------------------------------- 228.5/228.5 MB 8.0 MB/s eta 0:00:00 Collecting numpy Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl (15.8 MB) ---------------------------------------- 15.8/15.8 MB 11.3 MB/s eta 0:00:00 Collecting pillow!=8.3.*,>=5.3.0 Downloading pillow-10.3.0-cp310-cp310-win_amd64.whl (2.5 MB) ---------------------------------------- 2.5/2.5 MB 11.6 MB/s eta 0:00:00 Collecting intel-openmp==2021.* Downloading https://download.pytorch.org/whl/intel_openmp-2021.4.0-py2.py3-none-win_amd64.whl (3.5 MB) ---------------------------------------- 3.5/3.5 MB 11.8 MB/s eta 0:00:00 Collecting tbb==2021.* Downloading tbb-2021.12.0-py3-none-win_amd64.whl (286 kB) ---------------------------------------- 286.4/286.4 kB 8.9 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 Downloading https://download.pytorch.org/whl/MarkupSafe-2.1.5-cp310-cp310-win_amd64.whl (17 kB) Collecting mpmath<1.4.0,>=1.1.0 Downloading https://download.pytorch.org/whl/mpmath-1.3.0-py3-none-any.whl (536 kB) ---------------------------------------- 536.2/536.2 kB 8.5 MB/s eta 0:00:00 Installing collected packages: tbb, mpmath, intel-openmp, typing-extensions, sympy, pillow, numpy, networkx, mkl, MarkupSafe, fsspec, filelock, jinja2, torch, torchvision Successfully installed MarkupSafe-2.1.5 filelock-3.14.0 fsspec-2024.5.0 intel-openmp-2021.4.0 jinja2-3.1.4 mkl-2021.4.0 mpmath-1.3.0 networkx-3.3 numpy-1.26.4 pillow-10.3.0 sympy-1.12.1 tbb-2021.12.0 torch-2.3.0+cu121 torchvision-0.18.0+cu121 typing-extensions-4.12.0 [notice] A new release of pip available: 22.2.1 -> 24.0 [notice] To update, run: f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\venv\Scripts\python.exe -m pip install --upgrade pip Traceback (most recent call last): File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\launch.py", line 48, in main() File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\launch.py", line 39, in main prepare_environment() File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 589, in prepare_environment raise RuntimeError( RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check Press any key to continue . . . ````

After adding --skip-torch-cuda-test to webui.bat subj clone several repo, installing requirements and break with error: launch.py: error: unrecognized arguments: --backend directml

After I remove --backend directml subject download model https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors and start browser GUI. After I set up prompt and click Generate button receive error RuntimeError: mat1 and mat2 must have the same dtype, but got Float and Half. It fixed after I added '--no-half' to COMMANDLINE_ARGS. Now it is: set COMMANDLINE_ARGS= --skip-torch-cuda-test --no-half.

On this step stable-diffusion-webui-amdgpu work fine and able to generate images, but using CPU only. GPU utilization is around 0 ang CPU around 100%.

Please suggest how to resolve this, using GPU for inference.

Please do not add --skip-torch-cuda-test. If your install has no problem, it should be able to launch without --skip-torch-cuda-test unless you want cpu to run. If you get an error without --skip-torch-cuda-test, you have done something wrong. If your card is not NVIDIA, you need to add one of --use-* arguments. --use-zluda: best for decent AMD cards. (RX 6000 series or higher) works for older cards. --use-directml: legacy, but supports almost every cards. slower, memory-consuming. inefficient. --use-ipex: Intel IPEX

@lshqqytiger , thanks for quick answer. Step back. What all I did is: clone repo: git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu.git Edit webui.bat: added in the top set COMMANDLINE_ARGS= --backend directml. Run webui.bat. And received error:

Traceback (most recent call last):
  File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module>
    main()
  File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\launch.py", line 39, in main
    prepare_environment()
  File "f:\StableDiffusion\automatic1111\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 589, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

Win10, Python 3.10.6, Radeon RX 580 2048SP. Look like all necessary drivers/frameworks installed. GPU-Z_AUIxW1bIep

--backend directml was replaced with --use-directml.

Just a report. stable-diffusion-webui-amdgpu work fine both with --use-directml and CPU inference. Clean install; Win10, Python 3.10.6, Radeon RX 580 2048SP.

what's the full command args you are using? Also are you installing anything else like Rocm or Zluda ?.. your GPU is 2018 mine is 2014 made, it is still a good GPU and --use-directml was working for me too 6x times faster than CPU until an update this year that messed everything up... would be great to find out what causes the error above.

@Ael07 set COMMANDLINE_ARGS= --use-directml is enough to start working with GPU. Yes, I installed HIP SDK. But ROCm ver in last HIP is 5.7.1 and AMD drop Polaris support in 4.5. So it not works with my RX580, and I uninstall it. With your GPU it may work well. About additional soft - last version of Adrenaline driver and Vulkan runtime and SDK (https://vulkan.lunarg.com/sdk/home).

thx for that, ok if i use set COMMANDLINE_ARGS= --use-directml alone, it actually loads the model correctly but then runtime error when trying to generate the image; and of course it gives you a good hint: unspecified error!! :D

Error completing request Arguments: ('task(7pd8z4j2zpkbqmi)', <gradio.routes.Request object at 0x000001E9672D7430>, 'house', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'DPM++ 2M', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {} Traceback (most recent call last): File "C:\Users\y\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "C:\Users\y\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f res = func(args, kwargs) File "C:\Users\y\stable-diffusion-webui-directml\modules\txt2img.py", line 109, in txt2img processed = processing.process_images(p) File "C:\Users\y\stable-diffusion-webui-directml\modules\processing.py", line 847, in process_images res = process_images_inner(p) File "C:\Users\y\stable-diffusion-webui-directml\modules\processing.py", line 1075, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "C:\Users\y\stable-diffusion-webui-directml\modules\processing.py", line 1393, in sample self.sampler = sd_samplers.create_sampler(self.sampler_name, self.sd_model) File "C:\Users\y\stable-diffusion-webui-directml\modules\sd_samplers.py", line 41, in create_sampler sampler = config.constructor(model) File "C:\Users\y\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 31, in sd_samplers_common.SamplerData(label, lambda model, funcname=funcname: KDiffusionSampler(funcname, model), aliases, options) File "C:\Users\y\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 72, in init self.model_wrap = self.model_wrap_cfg.inner_model File "C:\Users\y\stable-diffusion-webui-directml\modules\sd_samplers_kdiffusion.py", line 57, in inner_model self.model_wrap = denoiser(shared.sd_model, quantize=shared.opts.enable_quantization) File "C:\Users\y\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 135, in init super().init(model, model.alphas_cumprod, quantize=quantize) File "C:\Users\y\stable-diffusion-webui-directml\repositories\k-diffusion\k_diffusion\external.py", line 92, in init super().init(((1 - alphas_cumprod) / alphas_cumprod) 0.5, quantize) File "C:\Users\y\stable-diffusion-webui-directml\venv\lib\site-packages\torch_tensor.py", line 40, in wrapped return f(args, kwargs) File "C:\Users\y\stable-diffusion-webui-directml\venv\lib\site-packages\torch_tensor.py", line 848, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: Unspecified error

I have an AMD RX500XT and the stable-diffusion-webui-directml folder in system32. I have the same problem and solved with this: -Delete the venv folder -Modify the webui-user.bat with this "set COMMANDLINE_ARGS= --use-directml --opt-sub-quad-attention --no-half --disable-nan-check --autolaunch" And have to look like this Screenshot 2024-06-03 131009

-Doble click webui-user.bat

Screenshot 2024-06-03 131215

Just doble click the webui-user.bat and that's all

Now you can use the gpu for Stable Diffusion Screenshot 2024-06-03 131909

( I have the AMD HIP SDK for Windows installed https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html )

Looks like it is working for you, i managed to make it work before without AMD HIP SDK. Right now not sure what happened but is not working anymore with --use-directml... works fine on CPU with a 5min per picture which is so annoying The question is does anyone have the same error as the one i sent last?! or am i the only one lol ... anybody can replicate the error? Thx

i also noticed that --no-half gives me the first error which is model failed to load .. if i delete it, i get the model to load but get the second error. --no-half is supposed to do what exactly? thx

does anybody have this issue or can replicate my error?! thanks

lshqqytiger / stable-diffusion-webui-amdgpu