lshqqytiger / stable-diffusion-webui-amdgpu

Stable Diffusion web UI
GNU Affero General Public License v3.0
1.67k stars 174 forks source link

[Bug]: SDXL vs SD1.5 image speeds Zluda #484

Closed VeteranXT closed 2 days ago

VeteranXT commented 2 days ago

Checklist

What happened?

SD 1.5 Runs faster than SDXL at 512x512 2x times. Same image generated takes to generate SD1.5 about 2-3 secs (8 steps)

Also upscale is 3x slower anything above 1.5x

Steps to reproduce the problem

  1. Write prompt for SD 1.5
  2. Generate
  3. Change to SDXL 4.Generate 5: See time difference for SD15/SDXL same 512x512.

What should have happened?

Generated image at same speed?

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

[https://pastebin.com/ka6b1Exc]SysInfo(url)

Console logs

venv "E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.9.3-amd-28-g371f53ed
Commit hash: 371f53ed7c926f9048ef95f45bc816cfbf37b564
Using ZLUDA in E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\.zluda
Installing sd-webui-controlnet requirement: changing opencv-python version from 4.10.0.84 to 4.8.0
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda
E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\vq_model.py:20: FutureWarning: `VQEncoderOutput` is deprecated and will be removed in version 0.31. Importing `VQEncoderOutput` from `diffusers.models.vq_model` is deprecated and this will be removed in a future version. Please use `from diffusers.models.autoencoders.vq_model import VQEncoderOutput`, instead.
  deprecate("VQEncoderOutput", "0.31", deprecation_message)
E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\venv\lib\site-packages\diffusers\models\vq_model.py:25: FutureWarning: `VQModel` is deprecated and will be removed in version 0.31. Importing `VQModel` from `diffusers.models.vq_model` is deprecated and this will be removed in a future version. Please use `from diffusers.models.autoencoders.vq_model import VQModel`, instead.
  deprecate("VQModel", "0.31", deprecation_message)
ONNX: version=1.18.1 provider=CPUExecutionProvider, available=['AzureExecutionProvider', 'CPUExecutionProvider']
[-] ADetailer initialized. version: 24.6.0, num models: 10
CivitAI Browser+: Aria2 RPC started
ControlNet preprocessor location: E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\extensions\sd-webui-controlnet\annotator\downloads
2024-07-01 03:37:50,080 - ControlNet - INFO - ControlNet v1.1.449
Loading weights [010be7341c] from E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\models\Stable-diffusion\Juggernaut_X_RunDiffusion_Hyper.safetensors
2024-07-01 03:37:51,444 - ControlNet - INFO - ControlNet UI callback registered.
Creating model from config: E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
*Deforum ControlNet support: enabled*
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 17.0s (prepare environment: 17.8s, initialize shared: 1.9s, list SD models: 0.3s, load scripts: 3.0s, create ui: 1.1s, gradio launch: 0.3s).
Applying attention optimization: sub-quadratic... done.
Model loaded in 8.1s (load weights from disk: 0.4s, create model: 0.8s, apply weights to model: 5.9s, move model to device: 0.1s, load textual inversion embeddings: 0.2s, calculate empty prompt: 0.5s).
Reusing loaded model Juggernaut_X_RunDiffusion_Hyper.safetensors [010be7341c] to load dreamshaper_8.safetensors [879db523c3]
Loading weights [879db523c3] from E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\models\Stable-diffusion\dreamshaper_8.safetensors
Creating model from config: E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
Applying attention optimization: sub-quadratic... done.
Model loaded in 30.5s (create model: 0.6s, apply weights to model: 29.7s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00,  2.44it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:07<00:00,  2.55it/s]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  2.90it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:06<00:00,  3.09it/s]
Reusing loaded model dreamshaper_8.safetensors [879db523c3] to load Juggernaut_X_RunDiffusion_Hyper.safetensors [010be7341c]
Loading weights [010be7341c] from E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\models\Stable-diffusion\Juggernaut_X_RunDiffusion_Hyper.safetensors
Creating model from config: E:\Storage\Apps\AI_Geneartor\stable-diffusion-webui-amdgpu\repositories\generative-models\configs\inference\sd_xl_base.yaml
Applying attention optimization: sub-quadratic... done.
Model loaded in 5.1s (create model: 0.3s, apply weights to model: 4.2s, move model to device: 0.2s, calculate empty prompt: 0.2s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00,  1.64it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00,  1.27it/s]

Additional information

No response

lshqqytiger commented 2 days ago

SD 1.5 and SDXL 1.0 have different parameter sizes. It means SDXL has more layers to get the final latents, and SDXL is slower than SD 1.5 even if the size of result is same. (512x512) This is also why SDXL generates better results than SD 1.5 in general.