lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.68k stars 863 forks source link

[Bug]: Increasing batch size slows it down.[Performance] #538

Open SysVR opened 8 months ago

SysVR commented 8 months ago

Checklist

What happened?

Increasing the batch count during generation will slow it down compared to sd.webui. The difference in generation time was over 1 hour at 512x768 count:100 size:8.

Steps to reproduce the problem

  1. input any prompt
  2. set batchsize to any
  3. Click [Generate].

What should have happened?

Generate faster than sd.webui

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-03-11-21-16.json

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
Launching Web UI with arguments:
Total VRAM 8192 MB, total RAM 32658 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce GTX 1070 : native
VAE dtype: torch.float32
CUDA Stream Activated:  False
Using pytorch cross attention
ControlNet preprocessor location: G:\webui_forge_cu121_torch21\webui\models\ControlNetPreprocessor
Loading weights [15012c538f] from G:\webui_forge_cu121_torch21\webui\models\Stable-diffusion\realisticVisionV51_v51VAE.safetensors
2024-03-12 06:02:34,453 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 71.0s (initial startup: 0.1s, prepare environment: 25.2s, import torch: 15.8s, import gradio: 7.2s, setup paths: 8.9s, initialize shared: 0.7s, other imports: 5.5s, setup gfpgan: 0.1s, load scripts: 3.8s, create ui: 3.2s, gradio launch: 1.6s).
model_type EPS
UNet ADM Dimension 0
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
To load target model SD1ClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  7216.9990234375
[Memory Management] Model Memory (MB) =  454.2076225280762
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  5738.791400909424
Moving model(s) has taken 0.17 seconds
Model loaded in 124.9s (load weights from disk: 4.2s, forge instantiate config: 0.7s, forge load real models: 117.2s, load textual inversion embeddings: 0.1s, calculate empty prompt: 2.7s).
To load target model BaseModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  7187.494140625
[Memory Management] Model Memory (MB) =  3278.8199005126953
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  2884.6742401123047
Moving model(s) has taken 0.67 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:04<00:00,  3.20s/it]
To load target model AutoencoderKL████████████████████                                 | 20/40 [00:58<01:01,  3.08s/it]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  3872.1396484375
[Memory Management] Model Memory (MB) =  319.11416244506836
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  2529.0254859924316
Moving model(s) has taken 0.23 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [01:01<00:00,  3.09s/it]
To load target model AutoencoderKL█████████████████████████████████████████████████████| 40/40 [02:04<00:00,  3.10s/it]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  3871.63134765625
[Memory Management] Model Memory (MB) =  319.11416244506836
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  2528.5171852111816
Moving model(s) has taken 0.07 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [02:11<00:00,  3.30s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [02:11<00:00,  3.10s/it]

Additional information

input prompt:test, cat, floor negative prompt: img res: 512x512

simple test:(Batch count/size:1)

sd.webui v1.8.0 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:09<00:00, 2.04it/s] sd.webui forge vf0.0.17v1.8.0rc-latest-276-g29be1da7 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:08<00:00, 2.37it/s]

sd.webui v1.8.0 Console full logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: v1.8.0 Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5 Launching Web UI with arguments: no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. Loading weights [15012c538f] from G:\sd.webui\webui\models\Stable-diffusion\realisticVisionV51_v51VAE.safetensors Creating model from config: G:\sd.webui\webui\configs\v1-inference.yaml Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Startup time: 9.6s (prepare environment: 2.1s, import torch: 3.4s, import gradio: 1.0s, setup paths: 0.9s, initialize shared: 0.2s, other imports: 0.5s, load scripts: 0.8s, create ui: 0.4s, gradio launch: 0.4s). Applying attention optimization: Doggettx... done. Model loaded in 2.5s (load weights from disk: 0.5s, create model: 0.7s, apply weights to model: 1.1s, calculate empty prompt: 0.1s). 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:57<00:00, 2.87s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:57<00:00, 2.86s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [02:00<00:00, 3.02s/it] Total progress: 100%|██████████████████████████████████████████████████████████████████| 40/40 [02:00<00:00, 2.86s/it]

SysVR commented 8 months ago

Update model:v1-5-pruned-emaonly.safetensors [6ce0161689] Width:512 Height:512 Batch count:100 Batch size:8

version: [v1.8.0] Total progress: 100%|████████████████████████████████████████████████████████████| 2000/2000 [1:36:43<00:00, 2.90s/it] Version: f0.0.17v1.8.0rc-latest-276-g29be1da7 Total progress: 100%|████████████████████████████████████████████████████████████| 2000/2000 [1:50:38<00:00, 3.32s/it]