[Bug]: Memory issues and absurdly slow speed when upscaling an image on Windows 11, with AMD Radeon RX 6750 XT

BlindTheBoundDemon commented 2 hours ago

Checklist

[ ] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

First of all, when installing webui for the first time, I had issues with launching it at all. Eventually, I managed to launch it with the following arguments: COMMANDLINE_ARGS= --use-directml --no-half --precision full --no-half-vae --opt-sub-quad-attention --opt-split-attention-v1 after browsing through issues of other people here.

Now, while the webui launches properly and allows me to generate images, it is absurdly irregular with its working speed (a 360x360 image from a text prompt generated in around 17 seconds while a single x2 upscale of the generated image took around 4 minutes). Not only that, but the webui consumes an insane amount of memory. The first image consumes around half of the dedicated GPU memory, while the upscale consumes nearly 100% of it.

Additionally, I want to note that while I was messing with the commandline arguments earlier, the webui did not want to upscale at all as it was constantly running out of memory, or it took an absurd 30 minutes to generate the upscaled image, which is something I don't believe should happen at all.

Steps to reproduce the problem

Install a clean version of the webui.
Set the commandline arguments of webui-user.bat to COMMANDLINE_ARGS= --use-directml --no-half --precision full --no-half-vae --opt-sub-quad-attention --opt-split-attention-v1.
Attempt to generate a 360x360 image from a text prompt, then attempt to upscale it to twice of its size.

What should have happened?

The upscale shouldn't take as long to generate and it shouldn't consume almost 100% of the GPU's memory.

What browsers do you use to access the UI ?

Google Chrome

Sysinfo

sysinfo-2024-10-17-12-18.json

Console logs

https://pastebin.com/raw/LvmNBEj6

Additional information

No response

lshqqytiger commented 2 hours ago

--use-directml: DirectML is slower than ZLUDA. --no-half --precision full --no-half-vae: Full precision (32-bit) is slower than half precision (16-bit)

Your card is capable to run ZLUDA and half precision. You are slowing down the generation by using arguments that are for older (pre-navi) cards.

BlindTheBoundDemon commented 2 hours ago

i did try zluda but when i did, the UI doesnt even launch and logs show this:


WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-11-gefddd05e
Commit hash: efddd05e11d9cc5339a41192457e6ff8ad06ae00
Traceback (most recent call last):
  File "C:\Users\48880\Desktop\SD AI\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module>
    main()
  File "C:\Users\48880\Desktop\SD AI\stable-diffusion-webui-amdgpu\launch.py", line 39, in main
    prepare_environment()
  File "C:\Users\48880\Desktop\SD AI\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 545, in prepare_environment
    from modules import zluda_installer, rocm
  File "C:\Users\48880\Desktop\SD AI\stable-diffusion-webui-amdgpu\modules\zluda_installer.py", line 16, in <module>
    HIPSDK_TARGETS = ['rocblas.dll', 'rocsolver.dll', f'hiprtc{"".join([v.zfill(2) for v in rocm.version.split(".")])}.dll']
AttributeError: 'NoneType' object has no attribute 'split'
Press any key to continue . . .```

lshqqytiger commented 2 hours ago

Did you install HIP SDK?

BlindTheBoundDemon commented 2 hours ago

I have no idea what that is

lshqqytiger commented 1 hour ago

Follow this guide. (expect for Install SD.Next section) HIP SDK is official compute library for AMDGPUs. DirectML is machine learning library by Microsoft, which is included in Windows by default, but not so performant comparing to CUDA and ROCm (HIP SDK).

BlindTheBoundDemon commented 1 hour ago

I tried and now i get this at the end of the terminal:


ONNX failed to initialize: module 'optimum.onnxruntime.modeling_diffusion' has no attribute '_ORTDiffusionModelPart'
Compiling in progress. Please wait...

rocBLAS error: Cannot read C:\Program Files\AMD\ROCm\6.1\bin\/rocblas/library/TensileLibrary.dat: No such file or directory for GPU arch : gfx1031

rocBLAS error: Could not initialize Tensile host:
regex_error(error_backref): The expression contained an invalid back reference.
Press any key to continue . . .```

lshqqytiger commented 1 hour ago

Did you follow Replace HIP SDK library files for unsupported GPU architectures step?

BlindTheBoundDemon commented 1 hour ago

yes

lshqqytiger commented 56 minutes ago

The rocmlibs you downloaded are built for HIP SDK 5.7, but you have HIP SDK 6.1, which is incompatible. Search for 6.1 (on GitHub? Reddit? I've seen one in SD.Next Discord) or just downgrade HIP SDK to 5.7.

BlindTheBoundDemon commented 52 minutes ago

i see, is there a particular sequence of downgrading or do i just launch the installer for 5.7 and it will downgrade HIP SDK on its own?

lshqqytiger commented 48 minutes ago

I haven't downgraded HIP SDK at all. But I'd recommend to uninstall existing one and run installer. As you replaced (and added) some files, wipe path/to/AMD/ROCm folder to prevent conflicts after uninstalling. (don't forget to patch files with custom rocmlibs after installing 5.7) You may have to modify environment variable HIP_PATH if the installer cannot handle it properly due to previous installation.

BlindTheBoundDemon commented 44 minutes ago

wait how do i uninstall the 6.1 one?

lshqqytiger commented 37 minutes ago

In control panel or settings like other programs. It is divided into some components. Uninstall all of them.

BlindTheBoundDemon commented 34 minutes ago

You may have to modify environment variable HIP_PATH if the installer cannot handle it properly due to previous installation.

also, where do i modify this?

lshqqytiger commented 30 minutes ago

I'm not sure whether it will be a problem. Check envvar if webui fails to find ROCm.

BlindTheBoundDemon commented 29 minutes ago

i tried launching webui-user.bat and got this:


WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.10.1-amd-11-gefddd05e
Commit hash: efddd05e11d9cc5339a41192457e6ff8ad06ae00
Failed to load ZLUDA: Could not find module 'C:\Users\48880\stable-diffusion-webui-amdgpu\.zluda\nvcuda.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Using CPU-only torch
C:\Users\48880\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\Users\48880\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
ONNX failed to initialize: module 'optimum.onnxruntime.modeling_diffusion' has no attribute '_ORTDiffusionModelPart'
Downloading: "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors" to C:\Users\48880\stable-diffusion-webui-amdgpu\models\Stable-diffusion\v1-5-pruned-emaonly.safetensors

loading stable diffusion model: FileNotFoundError
Traceback (most recent call last):
  File "C:\Users\48880\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\48880\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\48880\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\initialize.py", line 149, in load_model
    shared.sd_model  # noqa: B018
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
    load_model()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 788, in load_model
    checkpoint_info = checkpoint_info or select_checkpoint()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 234, in select_checkpoint
    raise FileNotFoundError(error_message)
FileNotFoundError: No checkpoints found. When searching for checkpoints, looked at:
 - file C:\Users\48880\stable-diffusion-webui-amdgpu\model.ckpt
 - directory C:\Users\48880\stable-diffusion-webui-amdgpu\models\Stable-diffusionCan't run without a checkpoint. Find and place a .ckpt or .safetensors file into any of those locations.

Stable diffusion model failed to load
Applying attention optimization: InvokeAI... done.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 26.9s (prepare environment: 36.5s, initialize shared: 1.5s, other imports: 0.2s, list SD models: 0.5s, load scripts: 0.8s, create ui: 0.3s, gradio launch: 0.2s).
loading stable diffusion model: FileNotFoundError
Traceback (most recent call last):
  File "C:\Users\48880\AppData\Local\Programs\Python\Python310\lib\threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "C:\Users\48880\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\ui.py", line 1740, in <lambda>
    visible=shared.sd_model and shared.sd_model.cond_stage_key == "edit"
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\shared_items.py", line 190, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 693, in get_sd_model
    load_model()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 788, in load_model
    checkpoint_info = checkpoint_info or select_checkpoint()
  File "C:\Users\48880\stable-diffusion-webui-amdgpu\modules\sd_models.py", line 234, in select_checkpoint
    raise FileNotFoundError(error_message)
FileNotFoundError: No checkpoints found. When searching for checkpoints, looked at:
 - file C:\Users\48880\stable-diffusion-webui-amdgpu\model.ckpt
 - directory C:\Users\48880\stable-diffusion-webui-amdgpu\models\Stable-diffusionCan't run without a checkpoint. Find and place a .ckpt or .safetensors file into any of those locations.

Stable diffusion model failed to load```

lshqqytiger / stable-diffusion-webui-amdgpu