Open kai1040112 opened 5 months ago
RX 7700S is not officially supported by AMD HIP SDK. (gfx1102) However, you can use unofficially built blas libraries. https://github.com/Na3MnO4/ROCmLibs-Fallback
I followed the steps copilot told me:
but I still got the error(and still couldn't generate anything):
venv "C:\sd\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
ROCm Toolkit was found.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.9.3-amd-24-g2c29feb5
Commit hash: 2c29feb50e5cd3592b3ea831fe20b17588a2edb4
Using ZLUDA in C:\sd\stable-diffusion-webui-amdgpu.zluda
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: pytorch_lightning.utilities.distributed.rank_zero_only
has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from pytorch_lightning.utilities
instead.
rank_zero_deprecation(
Launching Web UI with arguments:
ONNX: version=1.18.0 provider=AzureExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
ZLUDA device failed to pass basic operation test: index=None, device_name=AMD Radeon RX 7700S [ZLUDA]
CUDA error: operation not supported
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Running on local URL: http://127.0.0.1:7860
To create a public link, set share=True
in launch()
.
Startup time: 26.4s (prepare environment: 33.0s, initialize shared: 2.8s, load scripts: 0.6s, create ui: 0.6s, gradio launch: 0.4s).
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:10<00:00, 2.01s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Exception in thread MemMon:
Traceback (most recent call last):
File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 43, in run
torch.cuda.reset_peak_memory_stats()
File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 309, in reset_peak_memory_stats
return torch._C._cuda_resetPeakMemoryStats(device)
RuntimeError: invalid argument to reset_peak_memory_stats
Applying attention optimization: InvokeAI... done.
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:684: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:284: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz self.num_heads, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:324: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz self.num_heads, tgt_len, self.head_dim):
ONNX: Successfully exported converted model: submodel=text_encoder
C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\unets\unet_2d_condition.py:1114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if dim % default_overall_up_factor != 0:
ONNX: Failed to convert model: model='dynavisionXLAllInOneStylized_release0534bakedvae.safetensors', error=mat1 and mat2 shapes cannot be multiplied (1x2560 and 2816x1280)
Fetching 17 files: 100%|███████████████████████████████████████████████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:09<00:00, 1.89s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None
. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxRawPipeline
Error completing request
Arguments: ('task(570ykia0tb9ihw7)', <gradio.routes.Request object at 0x000001AF04E056C0>, 'girl', '', [], 1, 1, 7, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', 'Use same scheduler', '', '', [], 0, 20, 'PNDM', 'Automatic', False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {}
Traceback (most recent call last):
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 57, in f
res = list(func(*args, *kwargs))
File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 36, in f
res = func(args, kwargs)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\txt2img.py", line 109, in txt2img
processed = processing.process_images(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 847, in process_images
res = process_images_inner(p)
File "C:\sd\stable-diffusion-webui-amdgpu\modules\processing.py", line 952, in process_images_inner
result = shared.sd_model(kwargs)
TypeError: 'OnnxRawPipeline' object is not callable
Traceback (most recent call last): File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function prediction = await anyio.to_thread.run_sync( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper response = f(args, *kwargs) File "C:\sd\stable-diffusion-webui-amdgpu\modules\call_queue.py", line 95, in f mem_stats = {k: -(v//-(10241024)) for k, v in shared.mem_mon.stop().items()} File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 99, in stop return self.read() File "C:\sd\stable-diffusion-webui-amdgpu\modules\memmon.py", line 81, in read torch_stats = torch.cuda.memory_stats(self.device) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 258, in memory_stats stats = memory_stats_as_nested_dict(device=device) File "C:\sd\stable-diffusion-webui-amdgpu\venv\lib\site-packages\torch\cuda\memory.py", line 270, in memory_stats_as_nested_dict return torch._C._cuda_memoryStats(device) RuntimeError: invalid argument to memory_allocated
I saw this video: https://www.youtube.com/watch?v=YazUwPNsdzE, it told me to add %hip_path%bin to path, but when I type %hip_path%bin in my windows explorer, it says windows cant find it, so instead of %hip_path%bin, I add C:\Program Files\AMD\ROCm\5.7\bin to path. Is that why I get the error?
Make sure that environment variable ZLUDA
is not set.
Try again after removing .zluda
folder.
I removed the zluda folder from path, but it didn't change anything.
Happening to me as well with a 7900XT. States that there is not enough memory to convert the model: ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.
Tried with two different models as well, no change.
Final error is TypeError: 'OnnxRawPipeline' object is not callable
Edit: My bad, the error is different from OP's. But same outcome.
You need lots of memory to convert/optimize XL models. How much system memory do you have?
You need lots of memory to convert/optimize XL models. How much system memory do you have?
32GB. Would I need more than this to convert? Thanks for the quick reply.
Please try again after closing unnecessary processes. If still oom, you may need more.
Happening to me as well with a 7900XT. States that there is not enough memory to convert the model:
ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.
Tried with two different models as well, no change.
Final error is
TypeError: 'OnnxRawPipeline' object is not callable
Edit: My bad, the error is different from OP's. But same outcome.
With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.
For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides
Happening to me as well with a 7900XT. States that there is not enough memory to convert the model:
ONNX: Failed to convert model: model='prefectPonyXL_v10.safetensors', error=[enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1342177280 bytes.
Tried with two different models as well, no change. Final error isTypeError: 'OnnxRawPipeline' object is not callable
Edit: My bad, the error is different from OP's. But same outcome.With a 7900XT its not the best way to use Directml or Onnx. To get the best performance on Windows + less VRAM usage you should install the Zluda version. Im running it myself on a 7900XTX with no problems.
For any AMD or Nvidia User, i made a lot of Guides for Zluda, Directml, and all common Stable DIffusion Webui's like Auto1111, Comfyui, Fooocus, etc. You can find the Install Guides here: https://github.com/CS1o/Stable-Diffusion-Info/wiki/Installation-Guides
Thanks, CS1o! Any downsides or drawbacks to zluda?
@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.
Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.
@Aelzaire No problem! No downsides compared to Onnx and DirectML at all! The only thing is that some special extensions could not work. But i tested a lot and cant name any that wont rn. Zluda is very fast and uses less VRAM while beeing compatible with mostly anything.
Edit: Downsides of ONNX: Bad Compatibility with a lot of Extensions + Higher VRAM usage and Model Convertion needed. DirectML: Slower and Higher VRAM Usage. ZLUDA: Does not support very old GPUs as ROCm support is needed for it to work.
Yo, thank you so much for this. I just got it setup earlier and yeah this is way faster. Not quite as fast as ONNX but no limitation or anything, I'll take it. That's amazing. Thanks again so much. Had no idea about this.
Checklist
What happened?
I am running stable diffusion on a laptop with AMD Radeon RX 7700s, but it doesn't generate anything after I entered the prompts and click on the generate button.
Steps to reproduce the problem
What should have happened?
maybe stable diffusion couldn't use my gpu to generate photo because of some errors
What browsers do you use to access the UI ?
Microsoft Edge
Sysinfo
sysinfo-2024-06-02-12-55.json
Console logs
Additional information
螢幕擷取畫面 2024-06-02 205311 the gpu usage is very low when i tried to generate the picture(but it fails al the time)