Doesnt work for rx 7800 at all

KirillKocheshkov commented 9 months ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Can you update this git to support rx 7800

Proposed workflow

I just followed the guid till ihe end and faced the same issue, when it says that i dont have GPU

pw405 commented 9 months ago

should work fine on a 7800, but the version published yesterday is broken for everybody. Give it a day or two, I'm sure he'll publish a fix.

lshqqytiger commented 9 months ago

More detail?

ReiskaC commented 9 months ago

I just installed webui-directml according to the guide and had several stumbling blocks:

The install script failed on a mystical error about denied access to onnx DLL's
- As a workaround to this I found that uninstalling my legacy Nvidia drivers with DDU removed Cuda being detected at the beginning of the script and it finally went through
- I did also need to run the pip upgrade command as administrator to get it to finalize the install script from the webui-user.bat
I had to add COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half to be able to process anything in the webUI
And all generation is 100% CPU bound with my RX 7900 XTX clocking a whopping 1% load

So it seems that the Radeon GPU acceleration is disabled.

lshqqytiger commented 9 months ago

You should add --use-directml instead of --skip-torch-cuda-test to make AMD gpus work.

ReiskaC commented 9 months ago

With --use-directml (and also with --use-directml --precision-full --no-half) I get the same failure on webui-user.bat:

venv "F:\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: 1.7.0 Commit hash: 9e8f9bcf14f68099bb3562488361bd1a8393b2a5 no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. F:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Installing onnxruntime-directml Traceback (most recent call last): File "F:\stable-diffusion-webui-directml\launch.py", line 48, in main() File "F:\stable-diffusion-webui-directml\launch.py", line 39, in main prepare_environment() File "F:\stable-diffusion-webui-directml\modules\launch_utils.py", line 628, in prepare_environment run_pip("install onnxruntime-directml", "onnxruntime-directml") File "F:\stable-diffusion-webui-directml\modules\launch_utils.py", line 150, in run_pip return run( File "F:\stable-diffusion-webui-directml\modules\launch_utils.py", line 122, in run raise RuntimeError("\n".join(error_bits)) RuntimeError: Couldn't install onnxruntime-directml. Command: "F:\stable-diffusion-webui-directml\venv\Scripts\python.exe" -m pip install onnxruntime-directml --prefer-binary Error code: 1 stdout: Collecting onnxruntime-directml Using cached onnxruntime_directml-1.17.0-cp310-cp310-win_amd64.whl.metadata (4.3 kB) Requirement already satisfied: coloredlogs in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (15.0.1) Requirement already satisfied: flatbuffers in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (23.5.26) Requirement already satisfied: numpy>=1.21.6 in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (1.23.5) Requirement already satisfied: packaging in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (23.2) Requirement already satisfied: protobuf in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (3.20.3) Requirement already satisfied: sympy in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from onnxruntime-directml) (1.12) Requirement already satisfied: humanfriendly>=9.1 in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from coloredlogs->onnxruntime-directml) (10.0) Requirement already satisfied: mpmath>=0.19 in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from sympy->onnxruntime-directml) (1.3.0) Requirement already satisfied: pyreadline3 in f:\stable-diffusion-webui-directml\venv\lib\site-packages (from humanfriendly>=9.1->coloredlogs->onnxruntime-directml) (3.4.1) Using cached onnxruntime_directml-1.17.0-cp310-cp310-win_amd64.whl (15.4 MB) Installing collected packages: onnxruntime-directml

stderr: ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'F:\stable-diffusion-webui-directml\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_shared.dll' Check the permissions.

ReiskaC commented 9 months ago

With "set COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half" it starts up at least:

venv "F:\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: 1.7.0 Commit hash: 9e8f9bcf14f68099bb3562488361bd1a8393b2a5 no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. F:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Launching Web UI with arguments: --skip-torch-cuda-test --precision full --no-half Style database not found: F:\stable-diffusion-webui-directml\styles.csv Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. torch.utils._pytree._register_pytree_node( ONNX: selected=CUDAExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider'] Loading weights [6ce0161689] from F:\stable-diffusion-webui-directml\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors Running on local URL: http://127.0.0.1:7860 Creating model from config: F:\stable-diffusion-webui-directml\configs\v1-inference.yaml

To create a public link, set share=True in launch(). Startup time: 8.8s (prepare environment: 4.3s, initialize shared: 7.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.3s). Applying attention optimization: InvokeAI... done. Model loaded in 3.0s (load weights from disk: 0.5s, create model: 0.2s, apply weights to model: 2.2s).

CS1o commented 9 months ago

To setup the Directml webui properly (and without onnx) do the following steps:

Open up a cmd and type pip cache purge then hit enter and close the cmd. That will remove everything from your failed installs before. (like cuda stuff for nvidia)

Then delete the venv folder inside the stable-diffusion-webui folder.

Then edit the webui-user.bat It should look like this:

@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check
call webui.bat

Then save and relaunch the webui-user.bat

NikAWing commented 9 months ago

I did exactly this. Then I had to install more packages (this was not that easy for someone not used to python and co) then it finally started but it uses my CPU instead of my vid card.

There are way too many different how-tos for automatic1111 and AMD GPUs, most of the them did not work for me (clean setup). I got on to run (I think the how-to was on the AMD website/blog), it worked exactly once until I restarted A1111.

With this one here I at least am able to generate stuff but not with my GPU :o

Do I need the --medvram option if the card has 20GB VRAM?

edit: maybe it works but very slow? I see about 1.6-1.8 it/s, but the one experiment installation I mentioned above was around 24 it/s :o Now I try sd.next and get also 1.6-1.8 it/s.

pw405 commented 9 months ago

To setup the Directml webui properly (and without onnx) do the following steps:

Open up a cmd and type pip cache purge then hit enter and close the cmd. That will remove everything from your failed installs before. (like cuda stuff for nvidia)

Then delete the venv folder inside the stable-diffusion-webui folder.

Then edit the webui-user.bat It should look like this:
@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check
call webui.bat
Then save and relaunch the webui-user.bat

OMG THANK YOU THIS ACTUALLY WORKED!!!!!!! THANK YOU!!

@lshqqytiger - I can likely help with technical documentation & how-to guides in the future if you'd like assistance. I'm not a coding pro (I'm good at SQL, for what that's worth) but I make a TON of technical documentation at work for oil field workers. Oil field workers generally are not good with computers. At all. Often, they don't know there is a right & left click.

Just wanted to offer that in case you need help. You're clearly talented on the technical side, but it seems last few updates have resulted in tons of confusion & failed launches for users.

lshqqytiger commented 9 months ago

I'm not that good at documentation. I invited you as a collaborator and then you can edit wiki.

ReiskaC commented 9 months ago

I did a full clean of the environment including uninstalling all python versions, reinstalling everything and incorporating @CS1o's guidance and got the environment running on GPU (Radeon 7900 XTX, 24GB) acceleration.

I've done fresh git pull's daily and I'm now bumping into out-of-mem errors when trying to use Hires. Fix. with 512x768 images:

RuntimeError: Could not allocate tensor with 402653184 bytes. There is not enough GPU video memory available!

--- Snip --- Traceback (most recent call last): File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f res = func(*args, *kwargs) File "F:\stable-diffusion-webui-directml\modules\txt2img.py", line 55, in txt2img processed = processing.process_images(p) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 736, in process_images res = process_images_inner(p) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 962, in process_images_inner samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 1251, in sample return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 1343, in sample_hr_pass decoded_samples = decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 597, in decode_latent_batch sample = decode_first_stage(model, batch[i:i + 1])[0] File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 76, in decode_first_stage return samples_to_images_tensor(x, approx_index, model) File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype)) File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs)) File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(*args, *kwargs) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage return self.first_stage_model.decode(z) File "F:\stable-diffusion-webui-directml\modules\lowvram.py", line 71, in first_stage_model_decode_wrap return first_stage_model_decode(z) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode dec = self.decoder(z) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward h = self.up[i_level].block[i_block](h, temb) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 138, in forward h = self.norm2(h) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "F:\stable-diffusion-webui-directml\extensions-builtin\Lora\networks.py", line 516, in network_GroupNorm_forward return originals.GroupNorm_forward(self, input) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\normalization.py", line 273, in forward return F.group_norm( File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\functional.py", line 2530, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled) --- Snip ---

Isn't 24GB VRAM enough for upscaling? And can't the model be extended to regular memory? Got another 64GB of regular RAM to use as well.

lshqqytiger commented 9 months ago

I don't recommend hires with DirectML. Please use img2img. You can get larger image using ultimate upscale and ultrasharp.

ReiskaC commented 9 months ago

Sadly tying to upscale in img2img gives the same OoO errors:

--- snip --- Traceback (most recent call last): File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f res = func(*args, *kwargs) File "F:\stable-diffusion-webui-directml\modules\img2img.py", line 238, in img2img processed = process_images(p) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 736, in process_images res = process_images_inner(p) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 969, in process_images_inner x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True) File "F:\stable-diffusion-webui-directml\modules\processing.py", line 597, in decode_latent_batch sample = decode_first_stage(model, batch[i:i + 1])[0] File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 76, in decode_first_stage return samples_to_images_tensor(x, approx_index, model) File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype)) File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in setattr(resolved_obj, func_path[-1], lambda args, kwargs: self(*args, kwargs)) File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 28, in call return self.__orig_func(*args, *kwargs) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\diffusion\ddpm.py", line 826, in decode_first_stage return self.first_stage_model.decode(z) File "F:\stable-diffusion-webui-directml\modules\lowvram.py", line 71, in first_stage_model_decode_wrap return first_stage_model_decode(z) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\models\autoencoder.py", line 90, in decode dec = self.decoder(z) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 637, in forward h = self.up[i_level].block[i_block](h, temb) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "F:\stable-diffusion-webui-directml\repositories\stable-diffusion-stability-ai\ldm\modules\diffusionmodules\model.py", line 133, in forward h = self.conv1(h) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "F:\stable-diffusion-webui-directml\extensions-builtin\Lora\networks.py", line 501, in network_Conv2d_forward return originals.Conv2d_forward(self, input) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, File "F:\stable-diffusion-webui-directml\modules\dml\amp\autocast_mode.py", line 43, in setattr(resolved_obj, func_path[-1], lambda *args, *kwargs: forward(op, args, kwargs)) File "F:\stable-diffusion-webui-directml\modules\dml\amp\autocast_mode.py", line 15, in forward return op(args, **kwargs) RuntimeError: Could not allocate tensor with 402653184 bytes. There is not enough GPU video memory available!

--- Snip ---

CS1o commented 9 months ago

Hey, 12-24GB Vram is enough for Hires Fix, The important parts are the Settings you should use and knowing the Limit. This only works when using my Commandline_args settings mentioned above.

**Important for AMD Users: Everytime you get an out of GPU memory (vram) error, you need to fully restart the Webui, so the stuck vram gets cleared. If not restarted you will likely run in that error again and agin no matter which settings you try.**

There are 2 ways to get Hires Fix to work:

First Method (recommended by me)

For this you need an additional Extension called Tiled Diffusion & Tiled Vae or Multidiffusion: (The Name got changed) https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111 After installing it, restart the Webui, then you have two new tabs in txt2img at the bottom.

The following Hires Fix settings work on 6700XT (12GB) up to 7900XTX (24GB), with my settings: Dont enable Tiled Diffusion. Only use Tiled Vae. grafik grafik

The Important parts are the Resolution and the Hires Steps: Always use 10-15 Hires Steps to not get out of vram. With that Method your be able to Upscale 512x768 by 2 to get 1024x1536 or 960x540 by 2 to get 1920x1080 (FullHD) Not using the right Encoder/Decoder Resolution, or a uncommon Image Resolution will result in gray sqaures in the Image. So if using 1.5 based models, set it to Encoder: 1024, Decoder: 128 For SDXL, set "Upscale by" to 1.5, Resolution to 768x1024 and the Encoder: 1280, Decoder: 128

After that you can load the Image into img2img and use the sd upscale script or the ultimate upscale extension to get a 4k image.

Second Method:

First of all is to mention that using --no-half in the webui-user.bat will increase the vram usage alot. But AMD gpus need --no-half to get Inpainting in img2img to work, or Extension stuff like Adetailer or Tiled Diffusion & Vae. So if you dont plan to inpaint remove the --no-half or simply copy the webui-user.bat and made one for inpainting only.

So if you run it without --no-half it should work. with these Settings for sure. grafik

mongolsteppe commented 9 months ago

I have a rx 7800 XT and it works with these parameters: COMMANDLINE_ARGS=--use-directml --opt-sub-quad-attention --no-half --disable-nan-check --autolaunch

Iterations around 3-4x, not the ~20x another commenter mentioned. Couldn't make onnx work.

KirillKocheshkov commented 9 months ago

Now i oftern get an error :"RuntimeError: Could not allocate tensor with 2684354560 bytes. There is not enough GPU video memory available!". I reinstalled it, didnt hepl. Whenever i try to generate an img, it eats all may vram and never clears it. The moment i press generate it locks all my 16 gb of my vram and it stays the same even after generation is done. The only argument i have is --use-directml. Without it, it simply doesnt work just print an error: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"

pw405 commented 9 months ago

Now i oftern get an error :"RuntimeError: Could not allocate tensor with 2684354560 bytes. There is not enough GPU video memory available!". I reinstalled it, didnt hepl. Whenever i try to generate an img, it eats all may vram and never clears it. The moment i press generate it locks all my 16 gb of my vram and it stays the same even after generation is done. The only argument i have is --use-directml. Without it, it simply doesnt work just print an error: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"

I have a 7900XTX and still use the --medvram command line. I would suggest doing that. Here's my command line if you want to copy:

set COMMANDLINE_ARGS= --use-directml --medvram --no-half --skip-torch-cuda-test

Also suggest turning on Scaled Dot Product in settings -> Optimizations!!!!

KirillKocheshkov commented 9 months ago

Now, for some reasons, it doesnt generate and show this error every time i try to generate 1024x512 img ). It started after latest update log.txt

Patrick84 commented 8 months ago

To setup the Directml webui properly (and without onnx) do the following steps:

Open up a cmd and type pip cache purge then hit enter and close the cmd. That will remove everything from your failed installs before. (like cuda stuff for nvidia)

Then delete the venv folder inside the stable-diffusion-webui folder.

Then edit the webui-user.bat It should look like this:
@echo off

set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check
call webui.bat
Then save and relaunch the webui-user.bat

Thank you so much! This did it finally for me 🙏

lshqqytiger / stable-diffusion-webui-amdgpu