lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
5.33k stars 526 forks source link

[Bug]: Initial lag / slowdown when switching between Checkpoints and LoRAs #220

Open MNeMoNiCuZ opened 4 months ago

MNeMoNiCuZ commented 4 months ago

Checklist

What happened?

I have found that Forge is considerably slower and "laggier" when switching between Checkpoints, LoRAs, and the other extra networks tabs.

Compared to A1111, there's a long loading time on a fresh start of the program, but it seems that A1111 handles the content caching differently.

With Forge, if you switch between Checkpoints and LoRA's for example, there's then a period where nothing in the UI is clickable. I can't select a model, switch back to Checkpoints.

Going from Checkpoints to LoRA to Checkpoints again takes ~30 seconds for me. With A1111, it takes 5 seconds.

This won't be noticable until you have a large amount of models. I have ~1000 checkpoints and ~10.000 LoRAs.

Steps to reproduce the problem

  1. Download a 1000 checkpoints and 10.000 LoRAs.
  2. Enter the Checkpoints sub-tab.
  3. Wait a minute for it to load.
  4. Enter LoRA sub-tab.
  5. Wait another minute for it to load.
  6. Switch back to Checkpoints sub-tab.
  7. Wait ~10 seconds for it to load.
  8. Switch back to LoRA sub-tab
  9. Wait ~10 seconds for it to load.

And for 3-5 seconds after switching to another tab, the UI is unresponsive. I can still scroll the mouse-wheel to scroll the models list, but nothing anywhere can be clicked. The mouse cursor does not turn into a link-cursor (hand).

Additionally, even typing in the filter box is incredibly slow. It's like it re-filters everything and reloads it for every key that you type. I have to wait 5 seconds before each letter appears, if I type into the LoRA filter box with my 11k models.

What should have happened?

Forge should ideally perform on par with A1111, or better.

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-02-12-21-58.json

Console logs

venv "C:\AI\stable-diffusion-webui-forge\venv\Scripts\Python.exe"
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Version: f0.0.12-latest-113-ge11753ff
Commit hash: e11753ff844b6e06529287f65b2f9efd1fd76cc6
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Total VRAM 8192 MB, total RAM 65270 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 : native
VAE dtype: torch.bfloat16
Installing requirements for Face Editor
Faceswaplab : Use GPU requirements
Checking faceswaplab requirements
0.007527400040999055
Launching Web UI with arguments: --port 7861
Total VRAM 8192 MB, total RAM 65270 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3070 : native
VAE dtype: torch.bfloat16
Using pytorch cross attention
ControlNet preprocessor location: C:\AI\stable-diffusion-webui-forge\models\ControlNetPreprocessor
Civitai Helper: Get Custom Model Folder
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 24.1.2, num models: 58
23:00:12 - ReActor - STATUS - Running v0.6.1 on Device: CPU
Thumbnailizer initialized
Loading weights [f7a1beed86] from C:\AI\stable-diffusion-webui-forge\models\Stable-diffusion\Checkpoints\10 - SDXL\colossusProjectXLSFW_v53Trained.safetensors
2024-02-12 23:00:13,482 - ControlNet - INFO - ControlNet UI callback registered.
model_type EPS
UNet ADM Dimension 2816
Civitai Helper: Set Proxy:
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
left over keys: dict_keys(['conditioner.embedders.0.logit_scale', 'conditioner.embedders.0.text_projection', 'conditioner.embedders.1.model.transformer.text_model.embeddings.position_ids', 'decoder.conv_in.bias', 'decoder.conv_in.weight', 'decoder.conv_out.bias', 'decoder.conv_out.weight', 'decoder.mid.attn_1.k.bias', 'decoder.mid.attn_1.k.weight', 'decoder.mid.attn_1.norm.bias', 'decoder.mid.attn_1.norm.weight', 'decoder.mid.attn_1.proj_out.bias', 'decoder.mid.attn_1.proj_out.weight', 'decoder.mid.attn_1.q.bias', 'decoder.mid.attn_1.q.weight', 'decoder.mid.attn_1.v.bias', 'decoder.mid.attn_1.v.weight', 'decoder.mid.block_1.conv1.bias', 'decoder.mid.block_1.conv1.weight', 'decoder.mid.block_1.conv2.bias', 'decoder.mid.block_1.conv2.weight', 'decoder.mid.block_1.norm1.bias', 'decoder.mid.block_1.norm1.weight', 'decoder.mid.block_1.norm2.bias', 'decoder.mid.block_1.norm2.weight', 'decoder.mid.block_2.conv1.bias', 'decoder.mid.block_2.conv1.weight', 'decoder.mid.block_2.conv2.bias', 'decoder.mid.block_2.conv2.weight', 'decoder.mid.block_2.norm1.bias', 'decoder.mid.block_2.norm1.weight', 'decoder.mid.block_2.norm2.bias', 'decoder.mid.block_2.norm2.weight', 'decoder.norm_out.bias', 'decoder.norm_out.weight', 'decoder.up.0.block.0.conv1.bias', 'decoder.up.0.block.0.conv1.weight', 'decoder.up.0.block.0.conv2.bias', 'decoder.up.0.block.0.conv2.weight', 'decoder.up.0.block.0.nin_shortcut.bias', 'decoder.up.0.block.0.nin_shortcut.weight', 'decoder.up.0.block.0.norm1.bias', 'decoder.up.0.block.0.norm1.weight', 'decoder.up.0.block.0.norm2.bias', 'decoder.up.0.block.0.norm2.weight', 'decoder.up.0.block.1.conv1.bias', 'decoder.up.0.block.1.conv1.weight', 'decoder.up.0.block.1.conv2.bias', 'decoder.up.0.block.1.conv2.weight', 'decoder.up.0.block.1.norm1.bias', 'decoder.up.0.block.1.norm1.weight', 'decoder.up.0.block.1.norm2.bias', 'decoder.up.0.block.1.norm2.weight', 'decoder.up.0.block.2.conv1.bias', 'decoder.up.0.block.2.conv1.weight', 'decoder.up.0.block.2.conv2.bias', 'decoder.up.0.block.2.conv2.weight', 'decoder.up.0.block.2.norm1.bias', 'decoder.up.0.block.2.norm1.weight', 'decoder.up.0.block.2.norm2.bias', 'decoder.up.0.block.2.norm2.weight', 'decoder.up.1.block.0.conv1.bias', 'decoder.up.1.block.0.conv1.weight', 'decoder.up.1.block.0.conv2.bias', 'decoder.up.1.block.0.conv2.weight', 'decoder.up.1.block.0.nin_shortcut.bias', 'decoder.up.1.block.0.nin_shortcut.weight', 'decoder.up.1.block.0.norm1.bias', 'decoder.up.1.block.0.norm1.weight', 'decoder.up.1.block.0.norm2.bias', 'decoder.up.1.block.0.norm2.weight', 'decoder.up.1.block.1.conv1.bias', 'decoder.up.1.block.1.conv1.weight', 'decoder.up.1.block.1.conv2.bias', 'decoder.up.1.block.1.conv2.weight', 'decoder.up.1.block.1.norm1.bias', 'decoder.up.1.block.1.norm1.weight', 'decoder.up.1.block.1.norm2.bias', 'decoder.up.1.block.1.norm2.weight', 'decoder.up.1.block.2.conv1.bias', 'decoder.up.1.block.2.conv1.weight', 'decoder.up.1.block.2.conv2.bias', 'decoder.up.1.block.2.conv2.weight', 'decoder.up.1.block.2.norm1.bias', 'decoder.up.1.block.2.norm1.weight', 'decoder.up.1.block.2.norm2.bias', 'decoder.up.1.block.2.norm2.weight', 'decoder.up.1.upsample.conv.bias', 'decoder.up.1.upsample.conv.weight', 'decoder.up.2.block.0.conv1.bias', 'decoder.up.2.block.0.conv1.weight', 'decoder.up.2.block.0.conv2.bias', 'decoder.up.2.block.0.conv2.weight', 'decoder.up.2.block.0.norm1.bias', 'decoder.up.2.block.0.norm1.weight', 'decoder.up.2.block.0.norm2.bias', 'decoder.up.2.block.0.norm2.weight', 'decoder.up.2.block.1.conv1.bias', 'decoder.up.2.block.1.conv1.weight', 'decoder.up.2.block.1.conv2.bias', 'decoder.up.2.block.1.conv2.weight', 'decoder.up.2.block.1.norm1.bias', 'decoder.up.2.block.1.norm1.weight', 'decoder.up.2.block.1.norm2.bias', 'decoder.up.2.block.1.norm2.weight', 'decoder.up.2.block.2.conv1.bias', 'decoder.up.2.block.2.conv1.weight', 'decoder.up.2.block.2.conv2.bias', 'decoder.up.2.block.2.conv2.weight', 'decoder.up.2.block.2.norm1.bias', 'decoder.up.2.block.2.norm1.weight', 'decoder.up.2.block.2.norm2.bias', 'decoder.up.2.block.2.norm2.weight', 'decoder.up.2.upsample.conv.bias', 'decoder.up.2.upsample.conv.weight', 'decoder.up.3.block.0.conv1.bias', 'decoder.up.3.block.0.conv1.weight', 'decoder.up.3.block.0.conv2.bias', 'decoder.up.3.block.0.conv2.weight', 'decoder.up.3.block.0.norm1.bias', 'decoder.up.3.block.0.norm1.weight', 'decoder.up.3.block.0.norm2.bias', 'decoder.up.3.block.0.norm2.weight', 'decoder.up.3.block.1.conv1.bias', 'decoder.up.3.block.1.conv1.weight', 'decoder.up.3.block.1.conv2.bias', 'decoder.up.3.block.1.conv2.weight', 'decoder.up.3.block.1.norm1.bias', 'decoder.up.3.block.1.norm1.weight', 'decoder.up.3.block.1.norm2.bias', 'decoder.up.3.block.1.norm2.weight', 'decoder.up.3.block.2.conv1.bias', 'decoder.up.3.block.2.conv1.weight', 'decoder.up.3.block.2.conv2.bias', 'decoder.up.3.block.2.conv2.weight', 'decoder.up.3.block.2.norm1.bias', 'decoder.up.3.block.2.norm1.weight', 'decoder.up.3.block.2.norm2.bias', 'decoder.up.3.block.2.norm2.weight', 'decoder.up.3.upsample.conv.bias', 'decoder.up.3.upsample.conv.weight', 'encoder.conv_in.bias', 'encoder.conv_in.weight', 'encoder.conv_out.bias', 'encoder.conv_out.weight', 'encoder.down.0.block.0.conv1.bias', 'encoder.down.0.block.0.conv1.weight', 'encoder.down.0.block.0.conv2.bias', 'encoder.down.0.block.0.conv2.weight', 'encoder.down.0.block.0.norm1.bias', 'encoder.down.0.block.0.norm1.weight', 'encoder.down.0.block.0.norm2.bias', 'encoder.down.0.block.0.norm2.weight', 'encoder.down.0.block.1.conv1.bias', 'encoder.down.0.block.1.conv1.weight', 'encoder.down.0.block.1.conv2.bias', 'encoder.down.0.block.1.conv2.weight', 'encoder.down.0.block.1.norm1.bias', 'encoder.down.0.block.1.norm1.weight', 'encoder.down.0.block.1.norm2.bias', 'encoder.down.0.block.1.norm2.weight', 'encoder.down.0.downsample.conv.bias', 'encoder.down.0.downsample.conv.weight', 'encoder.down.1.block.0.conv1.bias', 'encoder.down.1.block.0.conv1.weight', 'encoder.down.1.block.0.conv2.bias', 'encoder.down.1.block.0.conv2.weight', 'encoder.down.1.block.0.nin_shortcut.bias', 'encoder.down.1.block.0.nin_shortcut.weight', 'encoder.down.1.block.0.norm1.bias', 'encoder.down.1.block.0.norm1.weight', 'encoder.down.1.block.0.norm2.bias', 'encoder.down.1.block.0.norm2.weight', 'encoder.down.1.block.1.conv1.bias', 'encoder.down.1.block.1.conv1.weight', 'encoder.down.1.block.1.conv2.bias', 'encoder.down.1.block.1.conv2.weight', 'encoder.down.1.block.1.norm1.bias', 'encoder.down.1.block.1.norm1.weight', 'encoder.down.1.block.1.norm2.bias', 'encoder.down.1.block.1.norm2.weight', 'encoder.down.1.downsample.conv.bias', 'encoder.down.1.downsample.conv.weight', 'encoder.down.2.block.0.conv1.bias', 'encoder.down.2.block.0.conv1.weight', 'encoder.down.2.block.0.conv2.bias', 'encoder.down.2.block.0.conv2.weight', 'encoder.down.2.block.0.nin_shortcut.bias', 'encoder.down.2.block.0.nin_shortcut.weight', 'encoder.down.2.block.0.norm1.bias', 'encoder.down.2.block.0.norm1.weight', 'encoder.down.2.block.0.norm2.bias', 'encoder.down.2.block.0.norm2.weight', 'encoder.down.2.block.1.conv1.bias', 'encoder.down.2.block.1.conv1.weight', 'encoder.down.2.block.1.conv2.bias', 'encoder.down.2.block.1.conv2.weight', 'encoder.down.2.block.1.norm1.bias', 'encoder.down.2.block.1.norm1.weight', 'encoder.down.2.block.1.norm2.bias', 'encoder.down.2.block.1.norm2.weight', 'encoder.down.2.downsample.conv.bias', 'encoder.down.2.downsample.conv.weight', 'encoder.down.3.block.0.conv1.bias', 'encoder.down.3.block.0.conv1.weight', 'encoder.down.3.block.0.conv2.bias', 'encoder.down.3.block.0.conv2.weight', 'encoder.down.3.block.0.norm1.bias', 'encoder.down.3.block.0.norm1.weight', 'encoder.down.3.block.0.norm2.bias', 'encoder.down.3.block.0.norm2.weight', 'encoder.down.3.block.1.conv1.bias', 'encoder.down.3.block.1.conv1.weight', 'encoder.down.3.block.1.conv2.bias', 'encoder.down.3.block.1.conv2.weight', 'encoder.down.3.block.1.norm1.bias', 'encoder.down.3.block.1.norm1.weight', 'encoder.down.3.block.1.norm2.bias', 'encoder.down.3.block.1.norm2.weight', 'encoder.mid.attn_1.k.bias', 'encoder.mid.attn_1.k.weight', 'encoder.mid.attn_1.norm.bias', 'encoder.mid.attn_1.norm.weight', 'encoder.mid.attn_1.proj_out.bias', 'encoder.mid.attn_1.proj_out.weight', 'encoder.mid.attn_1.q.bias', 'encoder.mid.attn_1.q.weight', 'encoder.mid.attn_1.v.bias', 'encoder.mid.attn_1.v.weight', 'encoder.mid.block_1.conv1.bias', 'encoder.mid.block_1.conv1.weight', 'encoder.mid.block_1.conv2.bias', 'encoder.mid.block_1.conv2.weight', 'encoder.mid.block_1.norm1.bias', 'encoder.mid.block_1.norm1.weight', 'encoder.mid.block_1.norm2.bias', 'encoder.mid.block_1.norm2.weight', 'encoder.mid.block_2.conv1.bias', 'encoder.mid.block_2.conv1.weight', 'encoder.mid.block_2.conv2.bias', 'encoder.mid.block_2.conv2.weight', 'encoder.mid.block_2.norm1.bias', 'encoder.mid.block_2.norm1.weight', 'encoder.mid.block_2.norm2.bias', 'encoder.mid.block_2.norm2.weight', 'encoder.norm_out.bias', 'encoder.norm_out.weight', 'post_quant_conv.bias', 'post_quant_conv.weight', 'quant_conv.bias', 'quant_conv.weight'])
Loading VAE weights specified in settings: C:\AI\stable-diffusion-webui-forge\models\VAE\sdxl_vae.safetensors
Running on local URL:  http://127.0.0.1:7861
No Image data blocks found.

To create a public link, set `share=True` in `launch()`.
Startup time: 33.0s (prepare environment: 12.2s, import torch: 2.6s, import gradio: 0.7s, setup paths: 0.5s, other imports: 0.7s, list SD models: 3.1s, load scripts: 6.6s, create ui: 6.1s, gradio launch: 0.2s).
No Image data blocks found.
No Image data blocks found.
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.48 seconds
Model loaded in 8.4s (load weights from disk: 0.1s, forge load real models: 4.3s, forge finalize: 0.4s, load VAE: 0.8s, load textual inversion embeddings: 1.9s, calculate empty prompt: 0.8s).

Additional information

Both A1111 and Forge are located on an NVMe drive. I have tried both in regular mode and incognito mode in Chrome and Firefox, the issue is there for all cases. I have disabled ad-blockers and the issue is still there.

magejosh commented 4 months ago

Similar slowdowns happen to generation speeds from just adding lora's to the prompt at all. Same prompt with or without lora in it takes different time to process. It's significant and noticeable if using more than one lora in the prompt. In a1111 generation times don't vary this much just because I added an extra Lora or two. For that matter, it seems to be unable to apply lora's to generations with the LCM sampler as well.