CANT GENERATE IMAGE return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated

kai1040112 commented 5 months ago

Checklist

[ ] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[x] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

I am running stable diffusion on a laptop with AMD Radeon RX 7700s, but it doesn't generate anything after I entered the prompts and click on the generate button.

Steps to reproduce the problem

download stable diffusion
webui-bat
enable onnx and olive

What should have happened?

maybe stable diffusion couldn't use my gpu to generate photo because of some errors

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-06-02-12-55.json

Console logs

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb50e5cd3592b3ea831fe20b17588a2edb4
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.5s (prepare environment: 19.7s, initialize shared: 2.6s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.10s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Applying attention optimization: InvokeAI... done.
Exception in thread MemMon:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
    torch.cuda.reset_peak_memory_stats()
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
    return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:07<00:00,  1.54s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(hy03hugzn8jrn39)', <gradio.routes.Request object at 0x000001FF8F29CCA0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
ONNX: Successfully exported converted model: submodel=text_encoder
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████| 5/5 [00:10<00:00,  2.12s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
*** Error completing request
*** Arguments: ('task(9o5ycnkv8wdtd7b)', <gradio.routes.Request object at 0x000001FF8C1CD120>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
    Traceback (most recent call last):
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
        processed = processing.process_images(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
        res = process_images_inner(p)
      File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
        result = shared.sd_model(**kwargs)
    TypeError: 'OnnxRawPipeline' object is not callable

---
Traceback (most recent call last):
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f
    mem_stats = {k: -(v//-(1024*1024)) for k, v in shared.mem_mon.stop().items()}
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop
    return self.read()
  File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read
    torch_stats = torch.cuda.memory_stats(self.device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats
    stats = memory_stats_as_nested_dict(device=device)
  File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict
    return torch._C._cuda_memoryStats(device)
RuntimeError: invalid argument to memory_allocated!

Additional information

螢幕擷取畫面 2024-06-02 205311 the gpu usage is very low when i tried to generate the picture(but it fails al the time)

lshqqytiger commented 5 months ago

RX 7700S is not officially supported by AMD HIP SDK. (gfx1102) However, you can use unofficially built blas libraries. https://github.com/Na3MnO4/ROCmLibs-Fallback

kai1040112 commented 5 months ago

I followed the steps copilot told me:

but I still got the error(and still couldn't generate anything):

venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe" ROCm Toolkit was found. Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: v1.9.3-amd-24-g2c29feb5 Commit hash: 2c29feb50e5cd3592b3ea831fe20b17588a2edb4 Using ZLUDA in C:\sd\stable-diffusion-webui-amdgpu.zluda no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities instead. rank_zero_deprecation( Launching Web UI with arguments: ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider'] ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA] CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 26.4s (prepare environment: 33.0s, initialize shared: 2.8s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s). Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s] Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00, 2.01s/it] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . Exception in thread MemMon: Traceback (most recent call last): File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner self.run() File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run torch.cuda.reset_peak_memory_stats() File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats return torch._C._cuda_resetPeakMemoryStats(device) RuntimeError: invalid argument to reset_peak_memory_stats Applying attention optimization: InvokeAI... done. WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next. C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect. mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device) C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len): C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len): C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim): ONNX: Successfully exported converted model: submodel=text_encoder C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if dim % default_overall_up_factor != 0: ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280) Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s] Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.89s/it] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline Error completing request Arguments: ('task(570ykia0tb9ihw7)', <gradio.routes.Request object at 0x000001AF04E056C0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {} Traceback (most recent call last): File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f res = list(func(*args, *kwargs)) File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f res = func(args, kwargs) File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img processed = processing.process_images(p) File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images res = process_images_inner(p) File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner result = shared.sd_model(kwargs) TypeError: 'OnnxRawPipeline' object is not callable

Traceback (most recent call last): File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, *kwargs) File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f mem_stats = {k: -(v//-(10241024)) for k, v in shared.mem_mon.stop().items()} File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop return self.read() File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read torch_stats = torch.cuda.memory_stats(self.device) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats stats = memory_stats_as_nested_dict(device=device) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated

I saw this video: https://www.youtube.com/watch?v=YazUwPNsdzE, it told me to add %hip_path%bin to path, but when I type %hip_path%bin in my windows explorer, it says windows cant find it, so instead of %hip_path%bin, I add C:\Program Files\AMD\ROCm\5.7\bin to path. Is that why I get the error?

lshqqytiger commented 5 months ago

Make sure that environment variable ZLUDA is not set. Try again after removing .zluda folder.

kai1040112 commented 5 months ago

I removed the zluda folder from path, but it didn't change anything.

Aelzaire commented 5 months ago

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

lshqqytiger commented 5 months ago

You need lots of memory to convert/optimize XL models. How much system memory do you have?

Aelzaire commented 5 months ago

You need lots of memory to convert/optimize XL models. How much system memory do you have?

32GB. Would I need more than this to convert? Thanks for the quick reply.

lshqqytiger commented 5 months ago

Please try again after closing unnecessary processes. If still oom, you may need more.

CS1o commented 5 months ago

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.

Tried with two different models as well, no change.

Final error is TypeError: 'OnnxRawPipeline' object is not callable

Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Aelzaire commented 5 months ago

Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes. Tried with two different models as well, no change. Final error is TypeError: 'OnnxRawPipeline' object is not callable Edit: My bad, the error is different from OP's. But same outcome.

With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.

For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides

Thanks, CS1o! Any downsides or drawbacks to zluda?

CS1o commented 5 months ago

@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

Aelzaire commented 5 months ago

@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.

Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.

Yo, thank you so much for this. I just got it setup earlier and yeah this is way faster. Not quite as fast as ONNX but no limitation or anything, I'll take it. That's amazing. Thanks again so much. Had no idea about this.

lshqqytiger / stable-diffusion-webui-amdgpu