19.09.2024 AssertionError: Torch not compiled with CUDA enabled. An attempt to use ZLUDA for Stable Diffusion on windows.

Harbitos commented 2 days ago

Trying to solve the bug (which is open here - https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge/issues/33), I tried to install HIP SDK 5 with RocblasLibs libraries following the instructions from here - https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU I noticed another mistake that I had never seen before. I tried to install it earlier, but there was an error with rocblas, so I used --directml and everything worked for me.

All the characteristics and the problem that started it all are described here - https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge/issues/33 In that error, --directml stopped helping me, I decided to try --use-zluda.

And another question for everyone: has anyone ever run Stable Diffusion Forge with --use-zluda on windows?

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 412c2d800dcae4cee8a7466a1e9128cfbbc5bf26
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
Traceback (most recent call last):
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\launch.py", line 58, in <module>
    main()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\launch.py", line 46, in main
    prepare_environment()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\launch_utils.py", line 608, in prepare_environment
    from modules import devices
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\devices.py", line 3, in <module>
    from backend import memory_management
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\memory_management.py", line 133, in <module>
    total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\memory_management.py", line 100, in get_torch_device
    return torch.device(torch.cuda.current_device())
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda\__init__.py", line 778, in current_device
    _lazy_init()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda\__init__.py", line 284, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Press any key to continue . . .

lshqqytiger commented 2 days ago

You should wipe venv before switching computing backend. Please try again after wiping venv folder. In addition, why https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU? Your card, RX 580, is gfx803, not gfx1103. Maybe this one is what you want.

Harbitos commented 1 day ago

You should wipe venv before switching computing backend. Please try again after wiping venv folder. In addition, why https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU? Your card, RX 580, is gfx803, not gfx1103. Maybe this one is what you want.

After all this, the SD started, but as in previous problems, it does not generate or write an error after generation. I have installed ROCmLibs-Fallback in C:\Program Files\AMD\ROCm\5.7\bin\rocblas without renaming the "libraries" folder.

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
ROCm Toolkit 5.7 was found.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 412c2d800dcae4cee8a7466a1e9128cfbbc5bf26
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.3.0+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
Launching Web UI with arguments: --theme dark --zluda --skip-version-check
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=0, device_name=AMD Radeon RX 580 2048SP [ZLUDA]
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
2024-09-19 18:43:42,770 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 20.2s (prepare environment: 6.8s, import torch: 6.8s, initialize shared: 2.1s, load scripts: 2.6s, create ui: 2.9s, gradio launch: 1.6s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Loading Model: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0}
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Press any key to continue . . .

TheFerumn commented 1 day ago

Its 100% something with your PC only since i already tested SD Forge for AMD on 2 different PCs with DirectML and ZLUDA as well. Hard to tell what exactly without errors. Maybe using --loglevel debug will show something more ? Its always worth trying. I don't know which command can actually show you more so just test it yourself. Zrzut ekranu 2024-09-19 180033

Harbitos commented 1 day ago

Its 100% something with your PC only since i already tested SD Forge for AMD on 2 different PCs with DirectML and ZLUDA as well. Hard to tell what exactly without errors. Maybe using --loglevel debug will show something more ? Its always worth trying. I don't know which command can actually show you more so just test it yourself.

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe" ROCm Toolkit 5.7 was found. fatal: No names found, cannot describe anything. Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: f2.0.1v1.10.1-1.10.1 Commit hash: 412c2d800dcae4cee8a7466a1e9128cfbbc5bf26 Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge.zluda 2024-09-19 20:23:10 DEBUG [root] Installing put extensions here.txt 2024-09-19 20:23:10 DEBUG [root] Installing extra-options-section 2024-09-19 20:23:10 DEBUG [root] Installing forge_legacy_preprocessors 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_inpaint 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_marigold 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_normalbae 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_recolor 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_reference 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_revision 2024-09-19 20:23:10 DEBUG [root] Installing forge_preprocessor_tile 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_animagine_xl_31 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_birefnet 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_example 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_florence_2 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_geowizard 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_iclight 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_idm_vton 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_illusion_diffusion 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_photo_maker_v2 2024-09-19 20:23:10 DEBUG [root] Installing forge_space_sapiens_normal 2024-09-19 20:23:10 DEBUG [root] Installing mobile 2024-09-19 20:23:10 DEBUG [root] Installing prompt-bracket-checker 2024-09-19 20:23:10 DEBUG [root] Installing ScuNET 2024-09-19 20:23:10 DEBUG [root] Installing sd_forge_controlllite 2024-09-19 20:23:10 DEBUG [root] Installing sd_forge_controlnet 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_dynamic_thresholding 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_fooocus_inpaint 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_freeu 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_ipadapter 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_kohya_hrfix 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_latent_modifier 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_lora 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_multidiffusion 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_neveroom 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_perturbed_attention 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_sag 2024-09-19 20:23:11 DEBUG [root] Installing sd_forge_stylealign 2024-09-19 20:23:11 DEBUG [root] Installing soft-inpainting 2024-09-19 20:23:11 DEBUG [root] Installing SwinIR Total VRAM 8192 MB, total RAM 16328 MB pytorch version: 2.3.0+cu118 Set vram state to: NORMAL_VRAM Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16 Launching Web UI with arguments: --theme dark --zluda --skip-version-check --loglevel debug CUDA Using Stream: False 2024-09-19 20:23:21 DEBUG [httpx] load_ssl_context verify=True cert=None trust_env=True http2=False 2024-09-19 20:23:21 DEBUG [httpx] load_verify_locations cafile='C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\certifi\cacert.pem' 2024-09-19 20:23:21 DEBUG [httpx] load_ssl_context verify=True cert=None trust_env=True http2=False 2024-09-19 20:23:21 DEBUG [httpx] load_verify_locations cafile='C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\certifi\cacert.pem' 2024-09-19 20:23:21 DEBUG [httpx] load_ssl_context verify=True cert=None trust_env=True http2=False 2024-09-19 20:23:21 DEBUG [httpx] load_verify_locations cafile='C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\certifi\cacert.pem' 2024-09-19 20:23:21 DEBUG [bitsandbytes.cextension] Loading bitsandbytes native library from: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.dll Using pytorch cross attention Using pytorch attention for VAE 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing BlpImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing BmpImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing BufrStubImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing CurImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing DcxImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing DdsImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing EpsImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing FitsImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing FitsStubImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing FliImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing FpxImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Image: failed to import FpxImagePlugin: No module named 'olefile' 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing FtexImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing GbrImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing GifImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing GribStubImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing Hdf5StubImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing IcnsImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing IcoImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing ImImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing ImtImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing IptcImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing JpegImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing Jpeg2KImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing McIdasImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing MicImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Image: failed to import MicImagePlugin: No module named 'olefile' 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing MpegImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing MpoImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing MspImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PalmImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PcdImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PcxImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PdfImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PixarImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PngImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PpmImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing PsdImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing QoiImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing SgiImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing SpiderImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing SunImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing TgaImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing TiffImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing WebPImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing WmfImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing XbmImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing XpmImagePlugin 2024-09-19 20:23:30 DEBUG [PIL.Image] Importing XVThumbImagePlugin 2024-09-19 20:23:35 DEBUG [matplotlib] matplotlib data path: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\matplotlib\mpl-data 2024-09-19 20:23:35 DEBUG [matplotlib] CONFIGDIR=C:\Users\user.matplotlib 2024-09-19 20:23:35 DEBUG [matplotlib] interactive is False 2024-09-19 20:23:35 DEBUG [matplotlib] platform is win32 2024-09-19 20:23:35 DEBUG [matplotlib] CACHEDIR=C:\Users\user.matplotlib 2024-09-19 20:23:35 DEBUG [matplotlib.font_manager] Using fontManager instance from C:\Users\user.matplotlib\fontlist-v390.json 2024-09-19 20:23:37 DEBUG [git.cmd] Popen(['git', 'version'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=None) 2024-09-19 20:23:37 DEBUG [git.cmd] Popen(['git', 'version'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=None) ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] ZLUDA device failed to pass basic operation test: index=0, device_name=AMD Radeon RX 580 2048SP [ZLUDA] CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor 2024-09-19 20:23:46 DEBUG [asyncio] Using selector: SelectSelector 2024-09-19 20:23:48,037 - ControlNet - INFO - ControlNet UI callback registered. Model selected: {'checkpoint_info': {'filename': 'C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\Stable-diffusion\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None} Using online LoRAs in FP16: False 2024-09-19 20:23:50 DEBUG [asyncio] Using selector: SelectSelector Running on local URL: http://127.0.0.1:7860 2024-09-19 20:23:50 DEBUG [httpx] load_ssl_context verify=None cert=None trust_env=True http2=False 2024-09-19 20:23:50 INFO [httpx] HTTP Request: GET http://127.0.0.1:7860/startup-events "HTTP/1.1 200 OK" 2024-09-19 20:23:50 DEBUG [httpx] load_ssl_context verify=False cert=None trust_env=True http2=False 2024-09-19 20:23:51 INFO [httpx] HTTP Request: HEAD http://127.0.0.1:7860/ "HTTP/1.1 200 OK"

To create a public link, set share=True in launch(). Startup time: 42.0s (prepare environment: 17.7s, import torch: 18.3s, initialize shared: 5.2s, other imports: 0.1s, load scripts: 3.9s, create ui: 3.2s, gradio launch: 1.6s). 2024-09-19 20:23:55 DEBUG [matplotlib.pyplot] Loaded backend tkagg version 8.6. 2024-09-19 20:23:55 DEBUG [matplotlib.pyplot] Loaded backend agg version v2.2. 2024-09-19 20:23:55 DEBUG [matplotlib.pyplot] Loaded backend tkagg version 8.6. 2024-09-19 20:23:55 DEBUG [matplotlib.pyplot] Loaded backend agg version v2.2. Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False} [GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation. 2024-09-19 20:23:55 DEBUG [git.cmd] Popen(['git', 'remote', 'get-url', '--all', 'origin'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=None) 2024-09-19 20:23:55 DEBUG [git.cmd] Popen(['git', 'cat-file', '--batch-check'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=) 2024-09-19 20:23:55 DEBUG [git.cmd] Popen(['git', 'cat-file', '--batch'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=) 2024-09-19 20:23:55 DEBUG [git.cmd] Popen(['git', 'remote', 'get-url', '--all', 'origin'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=None) 2024-09-19 20:23:56 DEBUG [git.cmd] Popen(['git', 'cat-file', '--batch-check'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=) 2024-09-19 20:23:56 DEBUG [git.cmd] Popen(['git', 'cat-file', '--batch'], cwd=C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge, universal_newlines=False, shell=None, istream=) 2024-09-19 20:24:10 INFO [modules.shared_state] Starting job task(rt6xopp5wid20e9) Exception in thread MemMon: Traceback (most recent call last): File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner self.run() File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\memmon.py", line 43, in run torch.cuda.reset_peak_memory_stats() File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats return torch._C._cuda_resetPeakMemoryStats(device) RuntimeError: invalid argument to reset_peak_memory_stats Loading Model: {'checkpoint_info': {'filename': 'C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\Stable-diffusion\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None} [Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done. StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0} C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( Press any key to continue . . .

TheFerumn commented 1 day ago

All i can say this error is new to me but i googled a little and it shouldn't be critical. Just wonder why its not showing to me ? Which version of Transformers you have ?

C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884

Other than that this might be an issue too but i can't really tell whats the source of it:

ZLUDA device failed to pass basic operation test: index=0, device_name=AMD Radeon RX 580 2048SP [ZLUDA]
CUDA error: out of memory

Harbitos commented 1 day ago

All i can say this error is new to me but i googled a little and it shouldn't be critical. Just wonder why its not showing to me ? Which version of Transformers you have ?
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
Other than that this might be an issue too but i can't really tell whats the source of it:
ZLUDA device failed to pass basic operation test: index=0, device_name=AMD Radeon RX 580 2048SP [ZLUDA]
CUDA error: out of memory

Screenshot_3 Maybe it

TheFerumn commented 1 day ago

Transformers should be fine. I am afraid i can't help you. Maybe lshqqytiger knows some methods to find the source of your problem.

My last suggestion is to never use Cyrillic letters or any letters other than english for your folders when using Windows :D OFC its probably not relevant since it worked for you before. I just had a lot of problems already in the past when using non standard letters in my windows username :D

EDIT: Nevermind its just "desktop" translated into your language but pythoon sees it as /desktop/ folder anyway

lshqqytiger commented 1 day ago

gfx803 has a bug that the driver occurs out of memory error even if there's enough memory left. This bug may be from driver or hip sdk. The exact cause is unknown. ~~You can try upgrading and downgrading graphics driver, but I'm not sure whether it works.~~

lshqqytiger commented 1 day ago

Could you try downgrading torch?

.\venv\Scripts\activate
pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu118

https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu/issues/479

Harbitos commented 1 day ago

Transformers should be fine. I am afraid i can't help you. Maybe lshqqytiger knows some methods to find the source of your problem.

My last suggestion is to never use Cyrillic letters or any letters other than english for your folders when using Windows :D OFC its probably not relevant since it worked for you before. I just had a lot of problems already in the past when using non standard letters in my windows username :D

EDIT: Nevermind its just "desktop" translated into your language but pythoon sees it as /desktop/ folder anyway

I had these errors, but they don't appear anymore. (Errors in the screenshots and my next comment)

error (1) translated error (2) translated

Harbitos commented 1 day ago

Could you try downgrading torch?
.\venv\Scripts\activate
pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu118
lshqqytiger/stable-diffusion-webui-amdgpu#479

I tried it, all the same. When I first started SD after downgrading the torch version, I got a very long error, and on the second launch everything is as before.

Could I have written something wrong: open cmd internal folder with SD and write: venv\scripts\activate then: pip uninstall torch torchvision torchaudio -y then: pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu118

AMD driver - 24.3.1 (I kind of installed the latest version and SD used to work for me, but after SD stopped working, installing the latest version or 24.3.1 did not help)

I have a second UBUNTU system, I tried to install SD-forge there (I'll use it there anyway), but I have installation errors and I'm waiting for another person to help me or figure it out myself. I have a plan B to reinstall the windows system, but I don't want to do that most of all because I have a lot of programs and files there. I want to use SD-forge on windows as well, because everything worked for me.

The first launch:

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
ROCm Toolkit 5.7 was found.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 412c2d800dcae4cee8a7466a1e9128cfbbc5bf26
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
You are using PyTorch below version 2.3. Some optimizations will be disabled.
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.2.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
Launching Web UI with arguments: --theme dark --zluda --skip-version-check
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
2024-09-20 11:26:45,767 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 237.9s (prepare environment: 9.3s, import torch: 8.6s, initialize shared: 216.0s, load scripts: 3.1s, create ui: 2.9s, gradio launch: 1.5s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0}
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Exception Code: 0xC0000005
0x00007FFEA4572CF4, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x62CF4 byte(s)
0x00007FFEA4571FFC, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x61FFC byte(s)
0x00007FFEA4574497, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x64497 byte(s)
0x00007FFEA457413E, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x6413E byte(s)
0x00007FFEA4573970, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x63970 byte(s)
0x00007FFEA4570E17, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x60E17 byte(s)
0x00007FFEA4570F39, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x60F39 byte(s)
0x00007FFEA4557068, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x47068 byte(s)
0x00007FFEA4518C7C, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x8C7C byte(s)
0x00007FFEA4529183, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\c10.dll(0x00007FFEA4510000) + 0x19183 byte(s)
0x00007FFD0C338B1F, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x6878B1F byte(s)
0x00007FFD0C33ACB6, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x687ACB6 byte(s)
0x00007FFD0C839C8C, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x6D79C8C byte(s)
0x00007FFD0D3272A5, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x78672A5 byte(s)
0x00007FFD0D2E3D1E, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x7823D1E byte(s)
0x00007FFD0CE5642C, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x739642C byte(s)
0x00007FFD0CEB8621, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x73F8621 byte(s)
0x00007FFD0CF862EE, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x74C62EE byte(s)
0x00007FFD0D1E1CC9, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x7721CC9 byte(s)
0x00007FFD0D1DFD5E, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x771FD5E byte(s)
0x00007FFD0CE5642C, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x739642C byte(s)
0x00007FFD0CEF825B, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_cpu.dll(0x00007FFD05AC0000) + 0x743825B byte(s)
0x00007FFE567193AA, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_python.dll(0x00007FFE564B0000) + 0x2693AA byte(s)
0x00007FFE5676A85E, C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\lib\torch_python.dll(0x00007FFE564B0000) + 0x2BA85E byte(s)
0x00007FFEA69149A6, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x649A6 byte(s)
0x00007FFEA68E094E, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x3094E byte(s)
0x00007FFEA68E0A4B, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A4B byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6924F15, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x74F15 byte(s)
0x00007FFEA68E0B8C, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B8C byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6924F15, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x74F15 byte(s)
0x00007FFEA68E0B8C, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B8C byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E3532, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33532 byte(s)
0x00007FFEA68E3964, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33964 byte(s)
0x00007FFEA68E0A83, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A83 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA692D712, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D712 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E3532, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33532 byte(s)
0x00007FFEA68E3964, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33964 byte(s)
0x00007FFEA68E6C3F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x36C3F byte(s)
0x00007FFEA692E02B, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7E02B byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E0B8C, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B8C byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E3532, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33532 byte(s)
0x00007FFEA68E3964, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x33964 byte(s)
0x00007FFEA68E0A83, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A83 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6924F15, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x74F15 byte(s)
0x00007FFEA6928C0B, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x78C0B byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA69277C4, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x777C4 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA69277C4, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x777C4 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E0B8C, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B8C byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA692D712, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D712 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E0B30, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B30 byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA69277C4, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x777C4 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA6928C0B, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x78C0B byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA68E0B30, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30B30 byte(s)
0x00007FFEA68E0A07, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x30A07 byte(s)
0x00007FFEA692D24F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x7D24F byte(s)
0x00007FFEA69280ED, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x780ED byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA6928C0B, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x78C0B byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA69277C4, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x777C4 byte(s)
0x00007FFEA6926267, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x76267 byte(s)
0x00007FFEA69277C4, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x777C4 byte(s)
0x00007FFEA68D9301, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x29301 byte(s)
0x00007FFEA68D7DB2, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x27DB2 byte(s)
0x00007FFEA68D6BFE, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x26BFE byte(s)
0x00007FFEA68D6B7E, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x26B7E byte(s)
0x00007FFEA69C424C, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x11424C byte(s)
0x00007FFEA69C4101, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x114101 byte(s)
0x00007FFEA69C3F04, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x113F04 byte(s)
0x00007FFEA69C3B7F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x113B7F byte(s)
0x00007FFEA69C398F, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x11398F byte(s)
0x00007FFEA69C2F98, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x112F98 byte(s)
0x00007FFEA69C2E29, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x112E29 byte(s)
0x00007FFEA69C27ED, C:\Users\user\AppData\Local\Programs\Python\Python310\python310.dll(0x00007FFEA68B0000) + 0x1127ED byte(s)
0x00007FF7EAF61230, C:\Users\user\AppData\Local\Programs\Python\Python310\python.exe(0x00007FF7EAF60000) + 0x1230 byte(s)
0x00007FFECF387344, C:\Windows\System32\KERNEL32.DLL(0x00007FFECF370000) + 0x17344 byte(s)
0x00007FFED04626B1, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFED0410000) + 0x526B1 byte(s)
Press any key to continue . . .

The second launch:

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
ROCm Toolkit 5.7 was found.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: 412c2d800dcae4cee8a7466a1e9128cfbbc5bf26
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
You are using PyTorch below version 2.3. Some optimizations will be disabled.
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.2.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
Launching Web UI with arguments: --theme dark --zluda
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CUDAExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
==============================================================================
You are running torch 2.2.1+cu118.
The program is tested to work with torch 2.3.1.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
2024-09-20 11:29:07,281 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 19.8s (prepare environment: 7.4s, import torch: 6.5s, initialize shared: 1.4s, load scripts: 2.7s, create ui: 3.0s, gradio launch: 1.5s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0}
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Press any key to continue . . .

TheFerumn commented 22 hours ago

AMD driver - 24.3.1 (I kind of installed the latest version and SD used to work for me, but after SD stopped working, installing the latest version or 24.3.1 did not help)

Have you tried to use AMD Pro Driver ? I feel like it has to be something with your driver since it can't load for you any model no matter which backend you use. The same happens on DirectML and ZLUDA. Torch version doesn't matter as well. It really feels like SD just can't access your memory for some reason. How about adding a lot of swap memory from your SSD drive and launching SD purely on CPU ? Will same thing happen ?

Harbitos commented 19 hours ago

AMD driver - 24.3.1 (I kind of installed the latest version and SD used to work for me, but after SD stopped working, installing the latest version or 24.3.1 did not help)

Have you tried to use AMD Pro Driver ? I feel like it has to be something with your driver since it can't load for you any model no matter which backend you use. The same happens on DirectML and ZLUDA. Torch version doesn't matter as well. It really feels like SD just can't access your memory for some reason. How about adding a lot of swap memory from your SSD drive and launching SD purely on CPU ? Will same thing happen ?

Hoooray! At least now it shows an error! I remembered that I recently formatted one ssd drive and could not delete some files in it, so I removed the swap file for the disk, but probably somehow also removed the swap file for the system disk. Nothing bad has happened to my PC all the time, the screen has not gone out. In short, I turned on the swap for the system disk and my SD-forge probably worked! Installing the AMD Pro version, at startup SD-forge windows only gave me the error that it could not enter the hip... procedure. That's why I install AMD Adrenalin back on.

I had a bigger error, but I fixed it by adding the line to webui-user.bat : set CUDA_LAUNCH_BLOCKING=1. It remains for me to solve this error: (screenshot)

It didn't help me but it was installed: pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

Attempt to write pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu118 led me to the fact that when I generate, it writes endlessly in the console "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..."

At that time, there were not many processes running on the PC, %temp% and the trash were cleared. Restarting the PC did not help.

Screenshot_3

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: aa15a03a082d3dbf77a99ce231276188a969d007
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
Launching Web UI with arguments: --theme dark --use-zluda --skip-version-check
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.3.0+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=0, device_name=AMD Radeon RX 580 2048SP [ZLUDA]
CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
2024-09-20 18:18:43,411 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 21.1s (prepare environment: 1.2s, import torch: 10.9s, initialize shared: 1.4s, load scripts: 2.6s, create ui: 3.1s, gradio launch: 1.7s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Loading Model: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0}
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
K-Model Created: {'storage_dtype': torch.float16, 'computation_dtype': torch.float16}
Model loaded in 1.3s (unload existing model: 0.3s, forge model load: 1.1s).
[Unload] Trying to free 1329.14 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7347.70 MB, Model Require: 234.72 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 6088.98 MB, All loaded to GPU.
Moving model(s) has taken 0.19 seconds
Traceback (most recent call last):
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules_forge\main_thread.py", line 30, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\txt2img.py", line 123, in txt2img_function
    processed = processing.process_images(p)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 818, in process_images
    res = process_images_inner(p)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 1023, in process_images_inner
    p.setup_conds()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 1619, in setup_conds
    super().setup_conds()
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 501, in setup_conds
    self.uc = self.get_conds_with_caching(prompt_parser.get_learned_conditioning, negative_prompts, total_steps, [self.cached_uc], self.extra_network_data)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\processing.py", line 472, in get_conds_with_caching
    cache[1] = function(shared.sd_model, required_prompts, steps, hires_steps, shared.opts.use_old_scheduling)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\modules\prompt_parser.py", line 189, in get_learned_conditioning
    conds = model.get_learned_conditioning(texts)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\diffusion_engine\sd15.py", line 63, in get_learned_conditioning
    cond = self.text_processing_engine(prompt)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\text_processing\classic_engine.py", line 268, in __call__
    z = self.process_tokens(tokens, multipliers)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\text_processing\classic_engine.py", line 301, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\backend\text_processing\classic_engine.py", line 126, in encode_with_transformers
    self.text_encoder.transformer.text_model.embeddings.position_embedding = self.text_encoder.transformer.text_model.embeddings.position_embedding.to(dtype=torch.float32)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1173, in to
    return self._apply(convert)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 804, in _apply
    param_applied = fn(param)
  File "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\torch\nn\modules\module.py", line 1159, in convert
    return t.to(
RuntimeError: CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

CUDA error: out of memory
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

lshqqytiger commented 6 hours ago

Attempt to write pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu118 led me to the fact that when I generate, it writes endlessly in the console "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..."

The first launch takes time. (15-20min) Be patient.

Harbitos commented 6 hours ago

Attempt to write pip install torch==2.2.1 torchvision==0.17.1 --index-url https://download.pytorch.org/whl/cu118 led me to the fact that when I generate, it writes endlessly in the console "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..." "Compiling in progress, please wait..."

The first launch takes time. (15-20min) Be patient.

And then it will generate each image quickly until I close the SD?
Does each AMD graphics card have 15-20 minutes to load for ZLUDA?
Is it normal that it warns about ZLUDA, but below it is just used?
And I also get an error after a long wait for generation. The first time there was an error, the second time it was generated, and the third time there was an error again.

venv "C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-1.10.1
Commit hash: aa15a03a082d3dbf77a99ce231276188a969d007
Using ZLUDA in C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\.zluda
Launching Web UI with arguments: --theme dark --use-zluda --skip-version-check
You are using PyTorch below version 2.3. Some optimizations will be disabled.
Total VRAM 8192 MB, total RAM 16328 MB
pytorch version: 2.2.1+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 580 2048SP [ZLUDA] : native
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: False
Using pytorch cross attention
Using pytorch attention for VAE
ONNX: version=1.19.2 provider=CPUExecutionProvider, available=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
ControlNet preprocessor location: C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\models\ControlNetPreprocessor
2024-09-21 12:55:00,024 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 36.1s (prepare environment: 1.2s, import torch: 20.3s, initialize shared: 4.4s, other imports: 0.1s, load scripts: 4.9s, create ui: 3.2s, gradio launch: 1.6s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 87.50% GPU memory (7168.00 MB) to load weights, and use 12.50% GPU memory (1024.00 MB) to do matrix computation.
Loading Model: {'checkpoint_info': {'filename': 'C:\\Users\\user\\Desktop\\Stable Diffusion Forge\\stable-diffusion-webui-amdgpu-forge\\models\\Stable-diffusion\\epicrealism_pureEvolutionV3.safetensors', 'hash': '42c8440c'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 686, 'vae': 248, 'text_encoder': 197, 'ignore': 0}
C:\Users\user\Desktop\Stable Diffusion Forge\stable-diffusion-webui-amdgpu-forge\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
K-Model Created: {'storage_dtype': torch.float16, 'computation_dtype': torch.float16}
Model loaded in 6.2s (unload existing model: 0.3s, forge model load: 5.9s).
[Unload] Trying to free 1329.14 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 7347.59 MB, Model Require: 234.72 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 6088.87 MB, All loaded to GPU.
Moving model(s) has taken 0.88 seconds
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 7022.51 MB ... Done.
[Unload] Trying to free 3155.23 MB for cuda:0 with 0 models keep loaded ... Current free memory is 7022.21 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 7022.21 MB, Model Require: 1639.41 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 4358.80 MB, All loaded to GPU.
Moving model(s) has taken 2.96 seconds
Compiling in progress. Please wait...
Compiling in progress. Please wait...
 31%|█████████████████████████▊                                                        | 11/35 [00:21<00:44,  1.86s/it]Compiling in progress. Please wait...███████▋                                           | 12/35 [00:20<00:42,  1.84s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 35/35 [01:20<00:00,  2.31s/it]
[Unload] Trying to free 1568.67 MB for cuda:0 with 0 models keep loaded ... Current free memory is 5356.89 MB ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 3702.89 MB, Model Require: 159.56 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 2519.33 MB, All loaded to GPU.
Moving model(s) has taken 0.11 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 35/35 [01:21<00:00,  2.32s/it]
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 2417.15 MB ... Done.
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 2422.92 MB ... Done.
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 2422.62 MB ... Done.
100%|██████████████████████████████████████████████████████████████████████████████████| 35/35 [01:32<00:00,  2.65s/it]
[Unload] Trying to free 1361.25 MB for cuda:0 with 1 models keep loaded ... Current free memory is 2530.36 MB ... Done.
Exception Code: 0xC0000005
0x00007FFEAEB28D72, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3A8D72 byte(s), hipMemUnmap() + 0x3CE32 byte(s)
0x00007FFEAEB3ED70, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3BED70 byte(s), hipMemUnmap() + 0x52E30 byte(s)
0x00007FFEAEB3ADA4, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3BADA4 byte(s), hipMemUnmap() + 0x4EE64 byte(s)
0x00007FFEAE89CAD8, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x11CAD8 byte(s), hipPeekAtLastError() + 0x8218 byte(s)
0x00007FFEAEB28D82, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3A8D82 byte(s), hipMemUnmap() + 0x3CE42 byte(s)
0x00007FFEAEBA6C6C, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x426C6C byte(s), hipMemUnmap() + 0xBAD2C byte(s)
0x00007FFEAEBAA87C, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x42A87C byte(s), hipMemUnmap() + 0xBE93C byte(s)
0x00007FFEAEB40B2A, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3C0B2A byte(s), hipMemUnmap() + 0x54BEA byte(s)
0x00007FFEAEB40C11, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x3C0C11 byte(s), hipMemUnmap() + 0x54CD1 byte(s)
0x00007FFEAEAED857, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x36D857 byte(s), hipMemUnmap() + 0x1917 byte(s)
0x00007FFEAEAEF7FF, C:\Windows\SYSTEM32\amdhip64.dll(0x00007FFEAE780000) + 0x36F7FF byte(s), hipMemUnmap() + 0x38BF byte(s)
0x00007FFED7017344, C:\Windows\System32\KERNEL32.DLL(0x00007FFED7000000) + 0x17344 byte(s), BaseThreadInitThunk() + 0x14 byte(s)
0x00007FFED83C26B1, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFED8370000) + 0x526B1 byte(s), RtlUserThreadStart() + 0x21 byte(s)
Press any key to continue . . .

lshqqytiger commented 3 hours ago

The compilation is one time thing. It does not recompile unless you upgrade/downgrade driver, hip sdk and torch.
ZLUDA needs to compile NVIDIA PTX assembly code into the binary (machine language) for AMDGPUs. It will automatically cache compiled binaries.
There's no problem.
I recommend reinstalling hip sdk if you had overwritten rocmlibs files with wrong files. If the issue persists, check VRAM usage. It could be a kind of out of memory error.

lshqqytiger / stable-diffusion-webui-amdgpu-forge

19.09.2024 AssertionError: Torch not compiled with CUDA enabled. An attempt to use ZLUDA for Stable Diffusion on windows. #34