lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
7.81k stars 750 forks source link

[Feature Request]: ZLUDA Support? #441

Open RandomLegend opened 7 months ago

RandomLegend commented 7 months ago

Is there an existing issue for this?

What would your feature do ?

Heyho,

currently i use a RTX3070 and i just ordered a RX 7900 XT. I know RocM is a thing but afaik it's not nearly as performant as CUDA?

So i found out about ZLUDA and that people got it working on A1111.

Did anyone try this on forge? I mean technically it should work just the same way as it does on A1111 right?

Proposed workflow

Not applicable

Additional information

No response

yacinesh commented 7 months ago

i was about to ask about the same thing, hope that @lllyasviel will look into it

joshaiken commented 6 months ago

I'm playing with ZLUDA today, and will update my comment if/when I learn other relevant details.

Running Win11 x64 + 7900XTX w/ Radeon "Game" driver. (*1)

currently i use a RTX3070 and i just ordered a RX 7900 XT. I know ROCm is a thing but afaik it's not nearly as performant as CUDA?

Pretty much, from the benchmarks I've collected/seen/observed the performance hierarchy is roughly...

  1. Linux-CUDA
  2. Linux-ROCm ~=/ Win-Cuda (situational, so I'm calling it a tie)
  3. Win-Zluda
  4. Linux-OpenML
  5. Win-OpenML
  6. CPU

So i found out about ZLUDA and that people got it working on A1111. Yep. Can confirm a wild speedup in A1111 on Win from OpenML to Zluda - ballpark is around 30x faster (3,000% !). There's more variables that I care to isolate, but as a quick check over 30runs w/ batch size 4 & SD 1.5 the 7900xtx went from an avg of 2s/it to 15it/s.

Besides performance, ROCm (and maybe OpenML?) either doesn't do inpainting, or does it horribly. I watched so very many hours of inpainting tutorials to discover this last year. Zluda enables reliable inpainting!

Zluda also makes deterministic details the same - so if your working with something generated with nVidia hardware they'll actually look the same (or as "the same" as it can be, given the countless other variables that affect the output).

Did anyone try this on forge? I mean technically it should work just the same way as it does on A1111 right?

Wouldn't know where to start - but I'm happy to try. If we work backwards from the "lyqqshytiger A1111 DirectML" scripts associated with the "--use-zluda" parameter and replace the cublas64_11.dll & cusparse64_11.dll files with the zluda versions should be able to get most of the way to a solution

mongolsteppe commented 6 months ago

Could you share what COMMANDLINE_ARGS you have set up for option 3) win-zluda? I changed from --use-directml to --use-zluda, but I get a 'RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check' (I don't get that with directml). Thanks!

Bocchi-Chan2023 commented 6 months ago

https://wikiwiki.jp/sd_toshiaki/%E3%82%B3%E3%83%A1%E3%83%B3%E3%83%88/Nvidia%E4%BB%A5%E5%A4%96%E3%81%AE%E3%82%B0%E3%83%A9%E3%83%9C%E3%81%AB%E9%96%A2%E3%81%97%E3%81%A6

I found this statement on this page helpful.

"I managed to run Forge with ZLUDA v3.5+7900XTX as follows: I used AnimagineXLV3 with Batch100, and it executed without any errors until the end, so I think it's relatively stable. I'll skip the details about setting the paths and environment variables since they're the same as SD.NEXT.

I ran webui.bat from Forge to start it, but it immediately shut down after starting. I reinstalled torch and torchvision: .\venv\Scripts\activate pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118

Then, I replaced cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll in venv\Lib\site-packages\torch\lib with the ones from ZLUDA.

In modules\initialize.py, under import torch, I added the following lines: torch.backends.cudnn.enabled = False torch.backends.cuda.enable_flash_sdp(False) torch.backends.cuda.enable_math_sdp(True) torch.backends.cuda.enable_mem_efficient_sdp(False)

That's how I did it."

yacinesh commented 6 months ago

@Bocchi-Chan2023 I tried your instructions but I got this error image

Bocchi-Chan2023 commented 6 months ago

@Bocchi-Chan2023 I tried your instructions but I got this error image

.\venv\Scripts\activate pip uninstall torch torchvision -y pip install torch==2.2.0 torchvision --index-url https://download.pytorch.org/whl/cu118

yacinesh commented 6 months ago

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

RandomLegend commented 6 months ago

All these guides are for windows.

PATHS and libraries work a little bit different in Linux and i'd love to see someone making ZLUDA + Forge work on Linux.

Bocchi-Chan2023 commented 6 months ago

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

yacinesh commented 6 months ago

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

am i correct here ? image

Zaakh commented 6 months ago

Maybe @lshqqytiger could help out?

Bocchi-Chan2023 commented 6 months ago

@Bocchi-Chan2023 yes i already done it . and the "cublas64_11.dll, cusparse64_11.dll, and nvrtc64_112_0.dll" files i copied them from sd next folder is that normal ?

I think it's still possible, but my recommendation would be to rename and deploy binaries downloaded from the latest zluda release :)

am i correct here ? image

yes

Grey3016 commented 6 months ago

Just to ask all of you, did you all get it working? because I did but it needed a couple more steps when installing and to get running - I don't want to fill this thread unless it's needed.

RandomLegend commented 6 months ago

@Grey3016 i did not, but again i am on Linux and the guides i found where for Windows.

I am not unsatisfied with the ROCm performance but i have no idea on what gains i am possibly missing out with ZLUDA.

brknsoul commented 5 months ago

@Grey3016 i did not, but again i am on Linux and the guides i found where for Windows.

I am not unsatisfied with the ROCm performance but i have no idea on what gains i am possibly missing out with ZLUDA.

You aren't. The only reason we're using ZLUDA in Windows is because we don't have ROCm in Windows... yet.

beosliege commented 5 months ago

Just to ask all of you, did you all get it working? because I did but it needed a couple more steps when installing and to get running - I don't want to fill this thread unless it's needed.

Would you be able to provide the extra steps you had to take? Thanks.

lshqqytiger commented 5 months ago

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

yacinesh commented 5 months ago

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

i'm already trying to use your forked forge but i'm getting alot of errors, where i can report for issues ?

lshqqytiger commented 5 months ago

I enabled issue feature

yacinesh commented 5 months ago

I opened issue feature

i've managed to opened it finally but it failed to install insightface automatically, should i install it manually or leave it ?

lshqqytiger commented 5 months ago

ignore if there isn't any issue (e.g. module not found)

Bocchi-Chan2023 commented 5 months ago

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

I could not start it in my environment. runtime and rocm are already installed. These are the errors I got:

Failed to install ZLUDA: 'Namespace' object has no attribute 'use_zluda_dnn'

RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

File "C:\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda__init__.py", line 284, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

joshaiken commented 5 months ago

Could you share what COMMANDLINE_ARGS you have set up for option 3) win-zluda? I changed from --use-directml to --use-zluda, but I get a 'RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check' (I don't get that with directml). Thanks!

./webui.bat --use-zluda --listen --no-half-vae

lshqqytiger commented 5 months ago

ZLUDA fork: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge launch with --zluda (optional) requirements Visual C++ Runtime ROCm 5.7

I could not start it in my environment. runtime and rocm are already installed. These are the errors I got:

Failed to install ZLUDA: 'Namespace' object has no attribute 'use_zluda_dnn'

RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check

File "C:\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda__init__.py", line 284, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled

Will fix