Open KirillKocheshkov opened 9 months ago
should work fine on a 7800, but the version published yesterday is broken for everybody. Give it a day or two, I'm sure he'll publish a fix.
More detail?
I just installed webui-directml according to the guide and had several stumbling blocks:
So it seems that the Radeon GPU acceleration is disabled.
You should add --use-directml
instead of --skip-torch-cuda-test
to make AMD gpus work.
With --use-directml (and also with --use-directml --precision-full --no-half) I get the same failure on webui-user.bat:
venv "F:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: 1.7.0
Commit hash: 9e8f9bcf14f68099bb3562488361bd1a8393b2a5
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
F:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only
has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities
instead.
rank_zero_deprecation(
Installing onnxruntime-directml
Traceback (most recent call last):
File "F:\stable-diffusion-webui-directml\launch.py", line 48, in
stderr: ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'F:\stable-diffusion-webui-directml\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_shared.dll' Check the permissions.
With "set COMMANDLINE_ARGS=--skip-torch-cuda-test --precision full --no-half" it starts up at least:
venv "F:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: 1.7.0
Commit hash: 9e8f9bcf14f68099bb3562488361bd1a8393b2a5
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
F:\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only
has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities
instead.
rank_zero_deprecation(
Launching Web UI with arguments: --skip-torch-cuda-test --precision full --no-half
Style database not found: F:\stable-diffusion-webui-directml\styles.csv
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\utils\outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
ONNX: selected=CUDAExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
Loading weights [6ce0161689] from F:\stable-diffusion-webui-directml\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Running on local URL: http://127.0.0.1:7860
Creating model from config: F:\stable-diffusion-webui-directml\configs\v1-inference.yaml
To create a public link, set share=True
in launch()
.
Startup time: 8.8s (prepare environment: 4.3s, initialize shared: 7.4s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.3s).
Applying attention optimization: InvokeAI... done.
Model loaded in 3.0s (load weights from disk: 0.5s, create model: 0.2s, apply weights to model: 2.2s).
To setup the Directml webui properly (and without onnx) do the following steps:
Open up a cmd and type
pip cache purge
then hit enter and close the cmd.
That will remove everything from your failed installs before. (like cuda stuff for nvidia)
Then delete the venv folder inside the stable-diffusion-webui folder.
Then edit the webui-user.bat It should look like this:
@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check
call webui.bat
Then save and relaunch the webui-user.bat
I did exactly this. Then I had to install more packages (this was not that easy for someone not used to python and co) then it finally started but it uses my CPU instead of my vid card.
There are way too many different how-tos for automatic1111 and AMD GPUs, most of the them did not work for me (clean setup). I got on to run (I think the how-to was on the AMD website/blog), it worked exactly once until I restarted A1111.
With this one here I at least am able to generate stuff but not with my GPU :o
Do I need the --medvram option if the card has 20GB VRAM?
edit: maybe it works but very slow? I see about 1.6-1.8 it/s, but the one experiment installation I mentioned above was around 24 it/s :o Now I try sd.next and get also 1.6-1.8 it/s.
To setup the Directml webui properly (and without onnx) do the following steps:
Open up a cmd and type
pip cache purge
then hit enter and close the cmd. That will remove everything from your failed installs before. (like cuda stuff for nvidia)Then delete the venv folder inside the stable-diffusion-webui folder.
Then edit the webui-user.bat It should look like this:
@echo off set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check call webui.bat
Then save and relaunch the webui-user.bat
OMG THANK YOU THIS ACTUALLY WORKED!!!!!!! THANK YOU!!
@lshqqytiger - I can likely help with technical documentation & how-to guides in the future if you'd like assistance. I'm not a coding pro (I'm good at SQL, for what that's worth) but I make a TON of technical documentation at work for oil field workers. Oil field workers generally are not good with computers. At all. Often, they don't know there is a right & left click.
Just wanted to offer that in case you need help. You're clearly talented on the technical side, but it seems last few updates have resulted in tons of confusion & failed launches for users.
I'm not that good at documentation. I invited you as a collaborator and then you can edit wiki.
I did a full clean of the environment including uninstalling all python versions, reinstalling everything and incorporating @CS1o's guidance and got the environment running on GPU (Radeon 7900 XTX, 24GB) acceleration.
I've done fresh git pull's daily and I'm now bumping into out-of-mem errors when trying to use Hires. Fix. with 512x768 images:
RuntimeError: Could not allocate tensor with 402653184 bytes. There is not enough GPU video memory available!
--- Snip ---
Traceback (most recent call last):
File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
res = list(func(*args, kwargs))
File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
res = func(*args, *kwargs)
File "F:\stable-diffusion-webui-directml\modules\txt2img.py", line 55, in txt2img
processed = processing.process_images(p)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 736, in process_images
res = process_images_inner(p)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 962, in process_images_inner
samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.subseed_strength, prompts=p.prompts)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 1251, in sample
return self.sample_hr_pass(samples, decoded_samples, seeds, subseeds, subseed_strength, prompts)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 1343, in sample_hr_pass
decoded_samples = decode_latent_batch(self.sd_model, samples, target_device=devices.cpu, check_for_nans=True)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 597, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in
Isn't 24GB VRAM enough for upscaling? And can't the model be extended to regular memory? Got another 64GB of regular RAM to use as well.
I don't recommend hires with DirectML. Please use img2img. You can get larger image using ultimate upscale and ultrasharp.
Sadly tying to upscale in img2img gives the same OoO errors:
--- snip ---
Traceback (most recent call last):
File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
res = list(func(*args, kwargs))
File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
res = func(*args, *kwargs)
File "F:\stable-diffusion-webui-directml\modules\img2img.py", line 238, in img2img
processed = process_images(p)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 736, in process_images
res = process_images_inner(p)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 969, in process_images_inner
x_samples_ddim = decode_latent_batch(p.sd_model, samples_ddim, target_device=devices.cpu, check_for_nans=True)
File "F:\stable-diffusion-webui-directml\modules\processing.py", line 597, in decode_latent_batch
sample = decode_first_stage(model, batch[i:i + 1])[0]
File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 76, in decode_first_stage
return samples_to_images_tensor(x, approx_index, model)
File "F:\stable-diffusion-webui-directml\modules\sd_samplers_common.py", line 58, in samples_to_images_tensor
x_sample = model.decode_first_stage(sample.to(model.first_stage_model.dtype))
File "F:\stable-diffusion-webui-directml\modules\sd_hijack_utils.py", line 17, in
--- Snip ---
Hey, 12-24GB Vram is enough for Hires Fix, The important parts are the Settings you should use and knowing the Limit. This only works when using my Commandline_args settings mentioned above.
**Important for AMD Users: Everytime you get an out of GPU memory (vram) error, you need to fully restart the Webui, so the stuck vram gets cleared. If not restarted you will likely run in that error again and agin no matter which settings you try.**
There are 2 ways to get Hires Fix to work:
For this you need an additional Extension called Tiled Diffusion & Tiled Vae or Multidiffusion: (The Name got changed) https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111 After installing it, restart the Webui, then you have two new tabs in txt2img at the bottom.
The following Hires Fix settings work on 6700XT (12GB) up to 7900XTX (24GB), with my settings: Dont enable Tiled Diffusion. Only use Tiled Vae.
The Important parts are the Resolution and the Hires Steps: Always use 10-15 Hires Steps to not get out of vram. With that Method your be able to Upscale 512x768 by 2 to get 1024x1536 or 960x540 by 2 to get 1920x1080 (FullHD) Not using the right Encoder/Decoder Resolution, or a uncommon Image Resolution will result in gray sqaures in the Image. So if using 1.5 based models, set it to Encoder: 1024, Decoder: 128 For SDXL, set "Upscale by" to 1.5, Resolution to 768x1024 and the Encoder: 1280, Decoder: 128
After that you can load the Image into img2img and use the sd upscale script or the ultimate upscale extension to get a 4k image.
First of all is to mention that using --no-half in the webui-user.bat will increase the vram usage alot. But AMD gpus need --no-half to get Inpainting in img2img to work, or Extension stuff like Adetailer or Tiled Diffusion & Vae. So if you dont plan to inpaint remove the --no-half or simply copy the webui-user.bat and made one for inpainting only.
So if you run it without --no-half it should work. with these Settings for sure.
I have a rx 7800 XT and it works with these parameters: COMMANDLINE_ARGS=--use-directml --opt-sub-quad-attention --no-half --disable-nan-check --autolaunch
Iterations around 3-4x, not the ~20x another commenter mentioned. Couldn't make onnx work.
Now i oftern get an error :"RuntimeError: Could not allocate tensor with 2684354560 bytes. There is not enough GPU video memory available!". I reinstalled it, didnt hepl. Whenever i try to generate an img, it eats all may vram and never clears it. The moment i press generate it locks all my 16 gb of my vram and it stays the same even after generation is done. The only argument i have is --use-directml. Without it, it simply doesnt work just print an error: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"
Now i oftern get an error :"RuntimeError: Could not allocate tensor with 2684354560 bytes. There is not enough GPU video memory available!". I reinstalled it, didnt hepl. Whenever i try to generate an img, it eats all may vram and never clears it. The moment i press generate it locks all my 16 gb of my vram and it stays the same even after generation is done. The only argument i have is --use-directml. Without it, it simply doesnt work just print an error: "RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check"
I have a 7900XTX and still use the --medvram command line. I would suggest doing that. Here's my command line if you want to copy:
set COMMANDLINE_ARGS= --use-directml --medvram --no-half --skip-torch-cuda-test
Also suggest turning on Scaled Dot Product in settings -> Optimizations!!!!
Now, for some reasons, it doesnt generate and show this error every time i try to generate 1024x512 img ). It started after latest update log.txt
To setup the Directml webui properly (and without onnx) do the following steps:
Open up a cmd and type
pip cache purge
then hit enter and close the cmd. That will remove everything from your failed installs before. (like cuda stuff for nvidia)Then delete the venv folder inside the stable-diffusion-webui folder.
Then edit the webui-user.bat It should look like this:
@echo off set PYTHON= set GIT= set VENV_DIR= set COMMANDLINE_ARGS=--use-directml --update-all-extensions --medvram --opt-sub-quad-attention --opt-split-attention --no-half --upcast-sampling --update-check call webui.bat
Then save and relaunch the webui-user.bat
Thank you so much! This did it finally for me 🙏
Is there an existing issue for this?
What would your feature do ?
Can you update this git to support rx 7800
Proposed workflow
I just followed the guid till ihe end and faced the same issue, when it says that i dont have GPU