lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.77k stars 183 forks source link

[Bug]: ONNX failed to initialize #519

Open dnkru opened 1 month ago

dnkru commented 1 month ago

Checklist

What happened?

I ran the program and saw that there were errors in the logs.

Steps to reproduce the problem

  1. paste this line in cmd: git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml && cd stable-diffusion-webui-directml && git submodule init && git submodule update
  2. Click webui-user.bat

What should have happened?

WenUi should have run with no errors to report.

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-08-11-20-00.json

Console logs

venv "C:\Users\dnkru\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
ROCm Toolkit 5.7 was found.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-2-g395ce8dc
Commit hash: 395ce8dc2cb01282d48074a89a5e6cb3da4b59ab
Using ZLUDA in C:\Users\dnkru\stable-diffusion-webui-directml\.zluda
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\Users\dnkru\stable-diffusion-webui-directml\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments:
ONNX failed to initialize: Failed to import diffusers.pipelines.auto_pipeline because of the following error (look up to see its traceback):
Failed to import diffusers.pipelines.aura_flow.pipeline_aura_flow because of the following error (look up to see its traceback):
cannot import name 'UMT5EncoderModel' from 'transformers' (C:\Users\dnkru\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\__init__.py)
Loading weights [6ce0161689] from C:\Users\dnkru\stable-diffusion-webui-directml\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors
Creating model from config: C:\Users\dnkru\stable-diffusion-webui-directml\configs\v1-inference.yaml
C:\Users\dnkru\stable-diffusion-webui-directml\venv\lib\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 18.8s (prepare environment: 23.8s, initialize shared: 3.9s, load scripts: 0.9s, create ui: 0.9s, gradio launch: 0.4s).
Applying attention optimization: Doggettx... done.
Model loaded in 10.5s (load weights from disk: 0.6s, create model: 1.2s, apply weights to model: 7.0s, apply half(): 0.1s, load textual inversion embeddings: 0.5s, calculate empty prompt: 0.9s).

Additional information

CS1o commented 1 month ago

Hey, these warnings can be ignored. But i see you have a 7700S, this card needs a specific hip file to work with Zluda. Follow my Automatic1111 with ZLUDA install guide from here to get everything working: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

xhy2008 commented 1 month ago

I met the problem today,and it's worse. The webui seems to be stucked,and I can't do anything.

venv "E:\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe" Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: v1.10.1-amd-2-g395ce8dc Commit hash: 395ce8dc2cb01282d48074a89a5e6cb3da4b59ab no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Launching Web UI with arguments: --lowvram --use-directml --autolaunch ONNX failed to initialize: Failed to import diffusers.pipelines.auto_pipeline because of the following error (look up to see its traceback): Failed to import diffusers.pipelines.aura_flow.pipeline_aura_flow because of the following error (look up to see its traceback): cannot import name 'UMT5EncoderModel' from 'transformers' (E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers__init__.py) Loading weights [6e430eb514] from E:\stable-diffusion-webui-amdgpu\models\Stable-diffusion\anything-4.5.safetensors Creating model from config: E:\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). E:\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Startup time: 99.3s (initial startup: 0.4s, prepare environment: 151.4s, initialize shared: 13.0s, other imports: 0.2s, setup gfpgan: 0.1s, list SD models: 0.3s, load scripts: 2.7s, initialize extra networks: 1.0s, scripts before_ui_callback: 0.1s, create ui: 3.5s, gradio launch: 1.5s).

CS1o commented 1 month ago

I see now issue here. The "no modules named xformers" can be ignored as its only for nvidia gpus. All other warnings can be ignored too.

I still recommend that you install the Zluda Version from my Guide. if you want a good performance and not running out of vram that fast.

neko47834 commented 1 month ago

I see now issue here. The "no modules named xformers" can be ignored as its only for nvidia gpus. All other warnings can be ignored too.

I still recommend that you install the Zluda Version from my Guide. if you want a good performance and not running out of vram that fast.

Sorry, since you know a lot about the subject... Can ZLUDA be installed on an RX 570?.... I looked at your tutorial and I see the RX 580... I saw that it works on RX 540

I have exactly the same problem as the user for updating the "Pip" dependency, even though I uninstall and install it, I can't get it to work anymore.

CS1o commented 1 month ago

Yes it will work on a RX570 as it has the same architecture as the RX580.

xhy2008 commented 1 month ago

But I am using a laptop,with a built-in GPU,4GB of VRAM. Can ZLUDA be used on AMD GFX GPUs?

neko47834 commented 1 month ago

Yes it will work on a RX570 as it has the same architecture as the RX580.

Well I followed your tutorial to the letter but there was no success... I get an error when starting SD: 45r

I did the same steps several times...

CS1o commented 1 month ago

Well I followed your tutorial to the letter but there was no success... I get an error when starting SD: I did the same steps several times...

Have you done this step too?: Its needed for RX580, 570, 540

Additional Step for RX580 Users: Go into the stable-diffusion-webui-amdgpu folder and click in the explorer bar (not searchbar) There Type cmd and hit enter.Then type and run these three commands on by one: venv\scripts\activate pip uninstall torch torchvision torchaudio -y pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

CS1o commented 1 month ago

But I am using a laptop,with a built-in GPU,4GB of VRAM. Can ZLUDA be used on AMD GFX GPUs?

ZLUDA only supports the following APUs: AMD 680M-APU AMD 780M-APU

With any other IGPU you have to use DirectML.

neko47834 commented 1 month ago

Well I followed your tutorial to the letter but there was no success... I get an error when starting SD: I did the same steps several times...

Have you done this step too?: Its needed for RX580, 570, 540

Additional Step for RX580 Users: Go into the stable-diffusion-webui-amdgpu folder and click in the explorer bar (not searchbar) There Type cmd and hit enter.Then type and run these three commands on by one: venv\scripts\activate pip uninstall torch torchvision torchaudio -y pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

Yes, I did it but without any result, in fact there are other users reporting the same thing in this series of cards... which according to them is a bug... but I don't know how to fix it.

CS1o commented 1 month ago

Yes, I did it but without any result, in fact there are other users reporting the same thing in this series of cards... which according to them is a bug... but I don't know how to fix it.

The screenshot from earlier shows Zluda device failed the basic test. Make sure you replaced the rocm library files with the right files and also you should have hip sdk 5.7 installed.

but what you can try is to use one of these two versions below from: Source

Your gpu can use this: https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/download/v0.5.7/rocm.gfx803.optic.test.version.7z

or this: https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/download/v0.5.7/rocblas.for.gfx803.override.with.vega10.7z

in both cases you need to replace the rocm library files again. best is to do this with a clean copy of the library folder. You also need to place the rocblas.dll from the zip, inside the C:\Program Files\AMD\ROCm\5.7\bin\rocblas folder.

A PC restart is always required after changing library files.

Let me know if it worked with these files.

neko47834 commented 1 month ago

Yes, I did it but without any result, in fact there are other users reporting the same thing in this series of cards... which according to them is a bug... but I don't know how to fix it.

The screenshot from earlier shows Zluda device failed the basic test. Make sure you replaced the rocm library files with the right files and also you should have hip sdk 5.7 installed.

but what you can try is to use one of these two versions below from: Source

Your gpu can use this: https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/download/v0.5.7/rocm.gfx803.optic.test.version.7z

or this: https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/releases/download/v0.5.7/rocblas.for.gfx803.override.with.vega10.7z

in both cases you need to replace the rocm library files again. best is to do this with a clean copy of the library folder. You also need to place the rocblas.dll from the zip, inside the C:\Program Files\AMD\ROCm\5.7\bin\rocblas folder.

A PC restart is always required after changing library files.

Let me know if it worked with these files.

Thank you very much for your great help, in fact following your guide and doing a complete cleanup of Python and Git I managed to get ZLUDA working but I don't know what happened to the dependencies that no longer work well... With ZLUDA I have problems on A111 and SDnext, the system RAM (16GB) is filled with each image, it is almost impossible to use Hires due to the same problem, from time to time it simply stops working....

neko47834 commented 1 month ago

Aside from how unstable SDnext and A111 with ZLUDA are, I have not been able to get Directml to work on any version, as this user specifies, ONNX does not work at all, I have already tried many versions and I cannot get them to work on clean installations.

CS1o commented 1 month ago

Thank you very much for your great help, in fact following your guide and doing a complete cleanup of Python and Git I managed to get ZLUDA working but I don't know what happened to the dependencies that no longer work well... With ZLUDA I have problems on A111 and SDnext, the system RAM (16GB) is filled with each image, it is almost impossible to use Hires due to the same problem, from time to time it simply stops working....

Hey np, the RAM issue can happen when you only have 16gb RAM and your using SDXL or Pony models. They need more RAM and VRAM than 1.5 based models. What you need to do in this case is to increase your Windows Pagefile (Virtual RAM). Here is a Guide on how to do that: https://www.tomshardware.com/news/how-to-manage-virtual-memory-pagefile-windows-10,36929.html Enable it only for C: set it to custom size: 16000 Min and 24000 Max Then reboot.

A RX570 has only 4gb VRAM so you need to to the following: Add --medvram to the webui-user.bat Commandline_ARGS= Also install the Tiled Diffusion & Tiled Vae (Multidiffusion) Extension and enable only Tiled VAE when using Hires fix. When using Hires fix always set the Hires Steps to 10 or it will run out of vram. And dont try to upscale sdxl or pony with hires fix and 4gb vram. You can upscale in img2img with the sd upscale script without getting out of vram.

xhy2008 commented 1 month ago

I think it's not only caused by onnx.I passed --skip-ort to the commandline.However,with out "onnx failed to init",the webui still don't work.The token counters on the edit box changed to -/- when I entered something in.Maybe CLIP is not correctly loaded.And now I have no idea to solve it.

Kargim commented 1 month ago
remote: Counting objects: 100% (1345/1345), done.
remote: Compressing objects: 100% (443/443), done.
remote: Total 1345 (delta 947), reused 1249 (delta 895), pack-reused 0 (from 0)
Receiving objects: 100% (1345/1345), 232.84 KiB | 709.00 KiB/s, done.
Resolving deltas: 100% (947/947), done.
Cloning BLIP into D:\SD\SD_A1111_Zluda_New\repositories\BLIP...
Cloning into 'D:\SD\SD_A1111_Zluda_New\repositories\BLIP'...
remote: Enumerating objects: 277, done.
remote: Counting objects: 100% (183/183), done.
remote: Compressing objects: 100% (46/46), done.
remote: Total 277 (delta 145), reused 137 (delta 137), pack-reused 94 (from 1)
Receiving objects: 100% (277/277), 7.04 MiB | 4.31 MiB/s, done.
Resolving deltas: 100% (152/152), done.
Installing requirements
Installing onnxruntime-gpu
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
D:\SD\SD_A1111_Zluda_New\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --ckpt-dir 'D:\SD\_Models\Stable-diffusion' --vae-dir 'D:\SD\_Models\VAE' --lora-dir 'D:\SD\_Models\Lora' --esrgan-models-path 'D:\SD\_Models\ESRGAN'
ONNX failed to initialize: Failed to import diffusers.pipelines.auto_pipeline because of the following error (look up to see its traceback):
Failed to import diffusers.pipelines.aura_flow.pipeline_aura_flow because of the following error (look up to see its traceback):
cannot import name 'UMT5EncoderModel' from 'transformers' (D:\SD\SD_A1111_Zluda_New\venv\lib\site-packages\transformers\__init__.py)
Calculating sha256 for D:\SD\_Models\Stable-diffusion\absolutereality_v181.safetensors: Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 193.0s (prepare environment: 199.4s, initialize shared: 4.6s, list SD models: 0.2s, load scripts: 0.8s, create ui: 0.5s, gradio launch: 0.3s).

Confirming the problem. I get this error on clean installation and every startup. Only --use-zluda parameter and custom paths to folders with models were used during installation. Also the problem during image generation. In the console, the progress abruptly goes to 95% and then freezes. At the same time the image preview does not work at all. There were no such problems before. CWindowssystem32cmd.exe.txt

CS1o commented 1 month ago

I think it's not only caused by onnx.I passed --skip-ort to the commandline.However,with out "onnx failed to init",the webui still don't work.The token counters on the edit box changed to -/- when I entered something in.Maybe CLIP is not correctly loaded.And now I have no idea to solve it.

Make sure you have whitelisted the webui in any browser adblocker. Also make sure you used: --use-directml --lowvram --opt-sub-quad-attention --opt-split-attention --no-half-vae --upcast-sampling Dont skip the onnx installation this time. Then delete the venv folder and relaunch the webui-user.bat again.

CS1o commented 1 month ago

Confirming the problem. I get this error on clean installation and every startup. Only --use-zluda parameter and custom paths to folders with models were used during installation. Also the problem during image generation. In the console, the progress abruptly goes to 95% and then freezes. At the same time the image preview does not work at all. There were no such problems before. CWindowssystem32cmd.exe.txt

if your on ZLUDA use --skip-ort in the launch args too. then delete the venv folder again. If your GPU has only 8gb vram also add --medvram Please provide more info when it freezes. Like Model used and resolution and other txt2img settings you enabled. Mostly its not a ZLUDA thing. Freezes can be caused by a to high resolution or by using hires fix with 0 hires steps instead of 10.

Kargim commented 1 month ago

I record on video the whole process from clean install to image generation. With “--skip-ort” the error “ONNX failed to initialize” disappeared, but the problem with the preview window still remains. I also demonstrate problem-free generation on the old version with only “--use-zluda” in the parameters. The file had to be uploaded to Google Drive (weight more than 10 Mb). https://drive.google.com/file/d/1bbhTsbfKOrKDu9ynLRxSVmYyjN8oK5Zd/view?usp=sharing

Kargim commented 4 weeks ago

Conducted a series of experiments.

  1. I had a commit “371f53e...0bde866”. It has no ONNX errors and the preview window works without problems. The “--skip-ort” parameter is not required.
  2. I tried installing commit “61aa844...67fdead”. It comes before the upgrade to 1.10. It has an ONNX error, but it is solved by applying “--skip-ort”. The preview window works without problems.
  3. If I update to commit “67fdead...235a1ff” (version 1.10) or higher, the preview window breaks immediately =(

Radeon RX 5500 XT 8Gb, Windows 10, python 3.10.11, HIP SDK 5.7.1 + ROCmLibs for old cards

neko47834 commented 4 weeks ago

Conducted a series of experiments.

  1. I had a commit “371f53e...0bde866”. It has no ONNX errors and the preview window works without problems. The “--skip-ort” parameter is not required.
  2. I tried installing commit “61aa844...67fdead”. It comes before the upgrade to 1.10. It has an ONNX error, but it is solved by applying “--skip-ort”. The preview window works without problems.
  3. If I update to commit “67fdead...235a1ff” (version 1.10) or higher, the preview window breaks immediately =(

Radeon RX 5500 XT 8Gb, Windows 10, python 3.10.11, HIP SDK 5.7.1 + ROCmLibs for old cards

Excuse me, how do I install or revert to that version that you say doesn't have ONNX errors? (0bde866) because I use DIRECTML and I get all kinds of errors... especially with ONNX I tried zluda but it's very unstable...

Kargim commented 4 weeks ago

Excuse me, how do I install or revert to that version that you say doesn't have ONNX errors? (0bde866) because I use DIRECTML and I get all kinds of errors... especially with ONNX I tried zluda but it's very unstable...

I use Zluda. I don't know how it is on DirectML. You can use the command “git checkout 67fdead” to change the build version. There “67fdead” is the version. You can see them at https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/activity There we are interested in the end of the line “61aa844...67fdead”. It is better to do experiments in another copy of the program.

Morpheus-79 commented 2 weeks ago

'ONNX failed to initialize' is caused by the used 'transformers' version (probably 4.30.2 or earlier) missing the 'UMT5EncoderModel' import ('\venv\Lib\site-packages\transformers__init__.py'). In fact it misses all UMT5 models. Newer 'transformers' versions include those models (4.45.0.dev0):

https://github.com/huggingface/transformers/blob/main/src/transformers/__init__.py

EDIT: Latest 'transformers' gives me an:

OSError: None is not a local folder and is not a valid model identifier

... error. But with 'diffusers==0.29.2' in combination with 'transformers==4.30.2' everything works fine.