[Bug]: SDXL olive optimized and running 2nd have memory not enough

Jay19751103 commented 7 months ago

Checklist

[ ] The issue exists after disabling all extensions
[ ] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[ ] The issue exists in the current version of the webui
[ ] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

System AMD 7600XT 16GB VRAM 32GB System RAM 200GB swapping.

When use olive directly, it can run every inference
Using this webUI, 2nd will have following Olive implementation is experimental. It contains potentially an issue and is subject to change at any time. ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00, 8.44s/it] 2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Steps to reproduce the problem

Following SD 1.5 installation and download sdxl sd_xl_base_1.0.safetensors from hugging face to copy into models\Stable-diffusion\sd_xl_base_1.0.safetensors Changing Setting ->OnnxRuntime -> Diffusers pipeline -> ONNX Stable Diffusion XL

What should have happened?

Should be same as olive inference

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-03-05-10-00.json

Console logs

2nd generate console log
Olive: Parameter change detected
Olive: Recompiling base model
WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next.
Olive implementation is experimental. It contains potentially an issue and is subject to change at any time.
ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00,  8.44s/it]
2024-03-05 17:52:48.2647133 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFE7621DA68: (caller: 00007FFE769851B1) Exception(4) tid(5988) 8007000E Not enough memory resources are available to complete this operation.

Additional information

No response

lshqqytiger commented 7 months ago

Close everything except for the necessary processes and webui then try again. Nevertheless, if it fails, download optimized models from huggingface or somewhere.

Jay19751103 commented 6 months ago

Hi

I have closed every or use more VRAM GPU card , there still have issue to recompiling even I don't change anything

First time it can reach 3.53 it/s After 10 image generated, I regenerate 1 image (batch count) . it will enter Recompiling (note generate 10 image also get same) and system goes to very slove 4.29s/it

lshqqytiger commented 6 months ago

The compilation parameter store may be changed in somewhere. (by the code or user) Are you sure that you don't change any parameters/options after 10 images are generated? Is it possible to reproduce that issue on SD.Next?

Jay19751103 commented 6 months ago

Hi Before generating image, change the width / height to 1024 it will enter breakpoint following is 1st print value

To create a public link, set share=True in launch(). Startup time: 1.9s (prepare environment: 6.2s, initialize shared: 0.8s, load scripts: 0.3s, create ui: 0.2s, gradio launch: 0.3s). Applying attention optimization: InvokeAI... done. -> if shared.sd_model.__class__.__name__ == "OnnxRawPipeline" or not shared.sd_model.__class__.__name__.startswith("Onnx"): (Pdb) p shared.sd_model.__class__.__name__ 'OnnxRawPipeline' (Pdb) c WARNING: ONNX implementation works best with SD.Next. Please consider migrating to SD.Next. Olive implementation is experimental. It contains potentially an issue and is subject to change at any time. 2024-03-13 16:14:36.5365812 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-03-13 16:14:36.5414884 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2024-03-13 16:14:41.0030069 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-03-13 16:14:41.0072032 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2024-03-13 16:14:41.3923077 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-03-13 16:14:41.3964782 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2024-03-13 16:14:42.3300983 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-03-13 16:14:42.3344864 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. 2024-03-13 16:14:43.0725744 [W:onnxruntime:, session_state.cc:1166 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf. 2024-03-13 16:14:43.0767242 [W:onnxruntime:, session_state.cc:1168 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments. ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:05<00:00, 3.47it/s]

after image display on webui, click generate again.

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl__init.py(159)check_parameters_changed() ` -> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"): (Pdb) p shared.sd_model.class.name 'OnnxStableDiffusionXLPipeline' ` (Pdb) l 154 155 def check_parameters_changed(p, refiner_enabled: bool): 156 from modules import shared, sd_models 157 158 breakpoint() 159 -> if shared.sd_model.class.name__ == "OnnxRawPipeline" or not shared.sd_model.class.name.startswith("Onnx"): 160 return shared.sd_model 161 162 breakpoint() 163 compile_height = p.height 164 compile_width = p.width

backtrace c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(973)_bootstrap() -> self._bootstrap_inner() c:\users\wenchien\appdata\local\anaconda3\envs\pytest_sd\lib\threading.py(1016)_bootstrap_inner() -> self.run() d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\anyio_backends_asyncio.py(807)run() -> result = context.run(func, args) d:\directml\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\gradio\utils.py(707)wrapper() -> response = f(args, kwargs) d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(57)f() -> res = list(func(*args, *kwargs)) d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py(36)f() -> res = func(args, kwargs) d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py(110)txt2img() -> processed = processing.process_images(p) d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(787)process_images() -> res = process_images_inner(p) d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\processing.py(848)process_images_inner() -> shared.sd_model = check_parameters_changed(p, False)

d:\directml\pytest_sd\stable-diffusion-webui-directml\modules\onnx_impl__init.py(159)check_parameters_changed() -> if shared.sd_model.class.name == "OnnxRawPipeline" or not shared.sd_model.class.name__.startswith("Onnx"):

Jay19751103 commented 6 months ago

Hi The first time class name is OnnxRawPipeline, the second time will be OnnxStableDiffusionXLPipeline

shared.compiled_model_state.height != compile_height
or shared.compiled_model_state.width != compile_width

one set is 1024, 1024, another is 512, 512 then condition true to enter Recompiling. (Pdb) p shared.compiled_model_state.width 512 (Pdb) p shared.compiled_model_state.height 512 (Pdb) p compile_height 1024 (Pdb) p compile_width 1024

Jay19751103 commented 6 months ago

Hi Any progress on this issue ? I also tried with Nvidia card, it have same issue with directml. 2nd will enter recompiling and then the system get "Not enough memory issue" Following is logs , after unet processed, it will enter vae-decoder, issue occurred.

ONNX: processing=StableDiffusionProcessingTxt2Img, pipeline=OnnxStableDiffusionXLPipeline 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:55<00:00, 2.80s/it] 2024-03-26 16:17:34.7141774 [E:onnxruntime:, sequential_executor.cc:514 onnxruntime::ExecuteKernel] Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

Error completing request Arguments: ('task(p85gmeh2y6y2m8t)', <gradio.routes.Request object at 0x00000280252DA3E0>, 'a cat', '', [], 20, 'PNDM', 5, 1, 7, 1024, 1024, False, 0.7, 2, 'Latent', 0, 0, 0, 'Use same checkpoint', 'Use same sampler', '', '', [], 0, False, '', 0.8, -1, False, -1, 0, 0, 0, False, False, 'positive', 'comma', 0, False, False, 'start', '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, False, False, False, 0, False) {} Traceback (most recent call last): File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f res = list(func(*args, kwargs)) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f res = func(args, kwargs) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\txt2img.py", line 110, in txt2img processed = processing.process_images(p) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 787, in process_images res = process_images_inner(p) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\modules\processing.py", line 892, in process_images_inner result = shared.sd_model(kwargs) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in call [self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])] File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\pipelines\diffusers\pipeline_stable_diffusion_xl.py", line 486, in [self.vae_decoder(latent_sample=latents[i : i + 1])[0] for i in range(latents.shape[0])] File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 482, in call return self.forward(args, kwargs) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\optimum\onnxruntime\modeling_diffusion.py", line 528, in forward outputs = self.session.run(None, onnx_inputs) File "D:\DirectML\pytest_sd\stable-diffusion-webui-directml\venv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running GroupNorm node. Name:'GroupNorm_22' Status Message: D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2571)\onnxruntime_pybind11_state.pyd!00007FFFBB3DDA68: (caller: 00007FFFBBB451B1) Exception(4) tid(20fc) 8007000E Not enough memory resources are available to complete this operation.

Jay19751103 commented 5 months ago

Hi Any fix for this 2nd enter Olive recompiling issue ?

lshqqytiger / stable-diffusion-webui-amdgpu