[Bug]: Keeping models loaded in VRAM

Lesteriax commented 5 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

I want to change between models fast without needing to load it. Using refiner for example, the process is too slow to load the other model. What's worse is then generating again, it will reload the first model, then reload the second model.

Steps to reproduce the problem

The option:

Only keep one model on device (will keep models other than the currently used one in RAM rather than VRAM) is not checked
Maximum number of checkpoints loaded at the same time. is set to 2

I'm using the same settings in automatic1111 and it's working fine

What should have happened?

If I choose two models loaded at the same time, it should switch between models lightning fast without the need to reload.

I think the function is broken

What browsers do you use to access the UI ?

Brave

Sysinfo

sysinfo-2024-06-08-08-47.json

Console logs

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f0.0.17v1.8.0rc-latest-276-g29be1da7
Commit hash: 29be1da7cf2b5dccfc70fbdd33eb35c56a31ffb7
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Installing requirements for CivitAI Browser
Installing requirements for CivitAI Browser
Installing requirements for CivitAI Browser
CUDA 12.1
Installing forge_legacy_preprocessor requirement: changing opencv-python version from 4.10.0.82 to 4.8.0
Installing sd-forge-controlnet requirement: changing opencv-python version from 4.10.0.82 to 4.8.0
Launching Web UI with arguments: --theme dark --cuda-stream --api
Total VRAM 24576 MB, total RAM 32619 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3090 : native
Hint: your device supports --pin-shared-memory for potential speed improvements.
Hint: your device supports --cuda-malloc for potential speed improvements.
VAE dtype: torch.bfloat16
CUDA Stream Activated:  True
Using pytorch cross attention
ControlNet preprocessor location: D:\ai-webuis\stable-forge\webui\models\ControlNetPreprocessor
CivitAI Browser+: Aria2 RPC started
11:53:33 - ReActor - STATUS - Running v0.7.0-b7 on Device: CUDA
Loading weights [67ab2fd8ec] from D:\ai-webuis\stable-forge\webui\models\Stable-diffusion\ponyDiffusionV6XL_v6StartWithThisOne.safetensors
2024-06-08 11:53:33,977 - ControlNet - INFO - ControlNet UI callback registered.
git: 'submodule' is not a git command. See 'git --help'.
[openOutpaint-extension-submodule] failed to download update, check network
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Startup time: 36.2s (prepare environment: 23.8s, import torch: 3.6s, import gradio: 0.8s, setup paths: 0.8s, initialize shared: 0.1s, other imports: 0.8s, load scripts: 2.8s, create ui: 1.4s, gradio launch: 1.4s, add APIs: 0.5s).
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
To load target model SDXLClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  23293.0146484375
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  20124.659950256348
Moving model(s) has taken 0.77 seconds
Model loaded in 10.2s (load weights from disk: 0.9s, forge instantiate config: 1.5s, forge load real models: 6.1s, forge finalize: 0.5s, calculate empty prompt: 1.2s).
11:53:59 - ReActor - ERROR - Please provide a source face
[LORA] Loaded D:\ai-webuis\stable-forge\webui\models\Lora\SDXL\add-detail-xl.safetensors for SDXL-UNet with 722 keys at weight 3.0 (skipped 0 keys)
[LORA] Loaded D:\ai-webuis\stable-forge\webui\models\Lora\SDXL\add-detail-xl.safetensors for SDXL-CLIP with 264 keys at weight 3.0 (skipped 0 keys)
To load target model SDXLClipModel
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) =  21466.8173828125
[Memory Management] Model Memory (MB) =  0.0
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  20442.8173828125
Moving model(s) has taken 0.98 seconds
To load target model SDXL
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  21427.66259765625
[Memory Management] Model Memory (MB) =  4897.086494445801
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  15506.57610321045
Moving model(s) has taken 2.72 seconds
 80%|█████████████████████████████████████████████████████████████████▌                | 16/20 [00:09<00:02,  1.71it/s]Loading weights [059934ff58] from D:\ai-webuis\stable-forge\webui\models\Stable-diffusion\mymodel_v3VAE.safetensors
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
loaded straight to GPU
To load target model SDXL
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  18188.1328125
[Memory Management] Model Memory (MB) =  0.03814697265625
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  17164.094665527344
Moving model(s) has taken 0.05 seconds
To load target model SDXLClipModel
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  18188.080078125
[Memory Management] Model Memory (MB) =  2144.3546981811523
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  15019.725379943848
Moving model(s) has taken 0.70 seconds
Model loaded in 7.6s (unload existing model: 0.5s, forge instantiate config: 1.1s, forge load real models: 5.0s, calculate empty prompt: 0.8s).
[LORA] Loaded D:\ai-webuis\stable-forge\webui\models\Lora\SDXL\add-detail-xl.safetensors for SDXL-UNet with 722 keys at weight 3.0 (skipped 0 keys)
[LORA] Loaded D:\ai-webuis\stable-forge\webui\models\Lora\SDXL\add-detail-xl.safetensors for SDXL-CLIP with 264 keys at weight 3.0 (skipped 0 keys)
To load target model SDXLClipModel
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) =  16420.9091796875
[Memory Management] Model Memory (MB) =  0.0
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  15396.9091796875
Moving model(s) has taken 0.57 seconds
To load target model SDXL
Begin to load 1 model
Reuse 1 loaded models
[Memory Management] Current Free GPU Memory (MB) =  16383.4658203125
[Memory Management] Model Memory (MB) =  0.0
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  15359.4658203125
Moving model(s) has taken 2.98 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:23<00:00,  1.19s/it]
To load target model AutoencoderKL█████████████████████████████████████████████████████| 20/20 [00:22<00:00,  1.80s/it]
Begin to load 1 model
[Memory Management] Current Free GPU Memory (MB) =  16450.16748046875
[Memory Management] Model Memory (MB) =  159.55708122253418
[Memory Management] Minimal Inference Memory (MB) =  1024.0
[Memory Management] Estimated Remaining GPU Memory (MB) =  15266.610399246216
Moving model(s) has taken 0.11 seconds
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:24<00:00,  1.20s/it]