[Bug]: Stable Diffusion XL Optimzation Failure

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet
What happened?

Cannot optimize "stabilityai/stable-diffusion-xl-base-1.0" using Olive. The process fails once it hits the "Optimizing unet" step.
Steps to reproduce the problem

What should have happened?

Optimization should've completed successfully.
What browsers do you use to access the UI ?

No response
Sysinfo

sysinfo-2024-01-28-14-05.json
Console logs

PS F:\stable-diffusion-webui-directml> .\webui-user.bat
venv "F:\stable-diffusion-webui-directml\venv\Scripts\Python.exe"
fatal: No names found, cannot describe anything.
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: 1.7.0
Commit hash: d500e58a65d99bfaa9c7bb0da6c3eb5704fadf25
Installing onnxruntime
Installing onnxruntime-directml
Launching Web UI with arguments: --use-directml --onnx
no module 'xformers'. Processing without...
No SDP backend available, likely because you are running in pytorch versions < 2.0. In fact, you are using PyTorch 1.13.1+cpu. You might want to consider upgrading.
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Style database not found: F:\stable-diffusion-webui-directml\styles.csv
==============================================================================
You are running torch 1.13.1+cpu.
The program is tested to work with torch 2.0.0.
To reinstall the desired version, run with commandline flag --reinstall-torch.
Beware that this will cause a lot of large files to be downloaded, as well as
there are reports of issues with training tab on the latest version.

Use --skip-version-check commandline argument to disable this check.
==============================================================================
Model stable-diffusion-xl-base-1.0 loaded.
Applying attention optimization: InvokeAI... done.
F:\stable-diffusion-webui-directml\modules\ui.py:1326: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
F:\stable-diffusion-webui-directml\modules\ui.py:1448: GradioDeprecationWarning: The `style` method is deprecated. Please set these arguments in the constructor instead.
  with gr.Row().style(equal_height=False):
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 8.5s (prepare environment: 1.5s, import torch: 2.5s, import gradio: 1.3s, setup paths: 1.5s, initialize shared: 1.3s, other imports: 0.3s, load scripts: 0.7s, create ui: 0.5s, gradio launch: 0.2s).
Keyword arguments {'requires_safety_checker': False} are not expected by StableDiffusionXLPipeline and will be ignored.
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  5.80it/s]

Optimizing text_encoder
[2024-01-28 09:00:45,275] [INFO] [engine.py:179:setup_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 09:00:45,299] [INFO] [engine.py:929:_run_pass] Running pass convert:OnnxConversion
F:\stable-diffusion-webui-directml\modules\dml\hijack\transformers.py:13: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  min = torch.tensor(torch.finfo(dtype).min, device="cpu")
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:634: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  encoder_states = () if output_hidden_states else None
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:639: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if output_hidden_states:
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:287: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz * self.num_heads, tgt_len, src_len):
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:295: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_attention_mask.size() != (bsz, 1, tgt_len, src_len):
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:327: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
F:\stable-diffusion-webui-directml\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:668: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if output_hidden_states:
[2024-01-28 09:00:50,690] [INFO] [engine.py:929:_run_pass] Running pass optimize:OrtTransformersOptimization
[2024-01-28 09:00:57,647] [INFO] [footprint.py:183:create_pareto_frontier] pareto frontier points: 1_OrtTransformersOptimization-0-19795fc1c081f11c880df7a463ccf313
{
  "latency-avg": 7.22172
}
[2024-01-28 09:00:57,647] [INFO] [engine.py:610:create_pareto_frontier_footprints] Output all 1 models
[2024-01-28 09:00:57,647] [INFO] [engine.py:357:run] Run history for gpu-dml:
[2024-01-28 09:00:57,648] [INFO] [engine.py:636:dump_run_history] Please install tabulate for better run history output
[2024-01-28 09:00:57,649] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts
Optimized text_encoder

Optimizing text_encoder_2
[2024-01-28 09:00:57,660] [INFO] [engine.py:179:setup_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 09:00:57,671] [INFO] [engine.py:929:_run_pass] Running pass convert:OnnxConversion
[2024-01-28 09:01:28,825] [WARNING] [common.py:108:model_proto_to_file] Model is too large to save as a single file but 'save_as_external_data' is False. Saved tensors as external data regardless.
[2024-01-28 09:01:28,830] [INFO] [engine.py:929:_run_pass] Running pass optimize:OrtTransformersOptimization
[2024-01-28 09:02:06,945] [INFO] [footprint.py:183:create_pareto_frontier] pareto frontier points: 3_OrtTransformersOptimization-2-19795fc1c081f11c880df7a463ccf313
{
  "latency-avg": 29.75187
}
[2024-01-28 09:02:06,945] [INFO] [engine.py:610:create_pareto_frontier_footprints] Output all 1 models
[2024-01-28 09:02:06,946] [INFO] [engine.py:357:run] Run history for gpu-dml:
[2024-01-28 09:02:06,947] [INFO] [engine.py:636:dump_run_history] Please install tabulate for better run history output
[2024-01-28 09:02:06,947] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts
Optimized text_encoder_2

Optimizing unet
[2024-01-28 09:02:06,958] [INFO] [engine.py:179:setup_accelerators] Running workflow on accelerator specs: gpu-dml
[2024-01-28 09:02:06,970] [INFO] [engine.py:929:_run_pass] Running pass convert:OnnxConversion
F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\models\unet_2d_condition.py:915: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if dim % default_overall_up_factor != 0:
[2024-01-28 09:02:15,963] [ERROR] [engine.py:1002:_run_pass] Pass run failed.
Traceback (most recent call last):
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\engine\engine.py", line 990, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, data_root, output_model_path, pass_search_point)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\systems\local.py", line 32, in run_pass
    output_model = the_pass.run(model, data_root, output_model_path, point)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\passes\olive_pass.py", line 371, in run
    output_model = self._run_for_config(model, data_root, config, output_model_path)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\passes\onnx\conversion.py", line 121, in _run_for_config
    return self._convert_model_on_device(model, data_root, config, output_model_path, device, torch_dtype)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\passes\onnx\conversion.py", line 343, in _convert_model_on_device
    converted_onnx_model = OnnxConversion._export_pytorch_model(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\olive\passes\onnx\conversion.py", line 216, in _export_pytorch_model
    torch.onnx.export(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\onnx\utils.py", line 504, in export
    _export(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\onnx\utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\onnx\utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\onnx\utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\onnx\utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\jit\_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\jit\_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\jit\_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 1018, in forward
    aug_emb = self.add_embedding(add_embeds)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\models\embeddings.py", line 228, in forward
    sample = self.linear_1(sample)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\diffusers\models\lora.py", line 430, in forward
    out = super().forward(hidden_states)
  File "F:\stable-diffusion-webui-directml\extensions-builtin\Lora\networks.py", line 486, in network_Linear_forward
    return originals.Linear_forward(self, input)
  File "F:\stable-diffusion-webui-directml\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
  File "F:\stable-diffusion-webui-directml\modules\dml\amp\autocast_mode.py", line 39, in <lambda>
    setattr(resolved_obj, func_path[-1], lambda *args, **kwargs: forward(op, args, kwargs))
  File "F:\stable-diffusion-webui-directml\modules\dml\amp\autocast_mode.py", line 13, in forward
    return op(*args, **kwargs)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x2304 and 2816x1280)
[2024-01-28 09:02:16,875] [WARNING] [engine.py:912:_run_passes] Skipping evaluation as model was pruned
[2024-01-28 09:02:16,876] [INFO] [engine.py:610:create_pareto_frontier_footprints] Output all 0 models
[2024-01-28 09:02:16,877] [INFO] [engine.py:357:run] Run history for gpu-dml:
[2024-01-28 09:02:16,877] [INFO] [engine.py:636:dump_run_history] Please install tabulate for better run history output
[2024-01-28 09:02:16,878] [INFO] [engine.py:372:run] No packaging config provided, skip packaging artifacts
*** Error completing request
*** Arguments: ('', '', 'vae', 'stable-diffusion-v1-5', 'stable-diffusion-v1-5', 'stabilityai/stable-diffusion-xl-base-1.0', '', 'vae', 'stable-diffusion-xl-base-1.0', 'stable-diffusion-xl-base-1.0', True, True, True, True, True, True, True, True, True, True, 'euler', True, 512, False, '', '', '') {}
    Traceback (most recent call last):
      File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "F:\stable-diffusion-webui-directml\modules\call_queue.py", line 36, in f
        res = func(*args, **kwargs)
      File "F:\stable-diffusion-webui-directml\modules\ui.py", line 1778, in optimize
        return optimize_sdxl_from_onnx(
      File "F:\stable-diffusion-webui-directml\modules\sd_olive_ui.py", line 280, in optimize_sdxl_from_onnx
        optimize(
      File "F:\stable-diffusion-webui-directml\modules\sd_olive_ui.py", line 358, in optimize
        assert conversion_footprint and optimizer_footprint
    AssertionError

---
Additional information

My GPU drivers are up to date. After starting this bug report, I attempted to optimize "runwayml/stable-diffusion-v1-5" and it works fine.
lshqqytiger / stable-diffusion-webui-amdgpu

[Bug]: Stable Diffusion XL Optimzation Failure #365