Generation time with lora increased 10 times (tested on schnell nf4)

koriekhov commented 3 months ago

In my case, after update, it/s didn't change, but initial load of models with LORAs to memory increased in maybe 10-20 times, and started to use additional page file on another drive.

version: f2.0.1v1.10.1-previous-275-g59790f2c python: 3.10.8 torch: 2.4.0+cu124 xformers: N/A gradio: 4.40.0

Just FLUX: Moving model(s) has taken 15.82 seconds FLUX + 1 LORA: Moving model(s) has taken 221.23 seconds

PixelClassicist commented 3 months ago

Same here. Here's info from without Lora and with Lora:

Memory cleanup has taken 5.54 seconds | 70/1960 [03:39<1:37:37, 3.10s/it] Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Memory Management] Current Free GPU Memory: 16511.12 MB [Memory Management] Required Model Memory: 5227.11 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 10260.02 MB LoRA patching has taken 1.46 seconds Moving model(s) has taken 1.46 seconds Distilled CFG Scale: 3.5 100%|██████████████████████████████████████████████████████████████████████████████████| 35/35 [01:44<00:00, 3.00s/it] Memory cleanup has taken 5.52 seconds | 105/1960 [05:33<1:35:04, 3.08s/it] Calculating sha256 for C:\Users\Forge_Flux\webui\models\Lora\Anatomy\SCG-Anatomy.safetensors: be85ee9fc99b1d733ea78c1276abbdd6cae045f04aee2e90b19a337f98c4582f [LORA] Loaded C:\Users\Forge_Flux\webui\models\Lora\Anatomy\SCG-Anatomy.safetensors for KModel-UNet with 324 keys at weight 1.0 (skipped 0 keys) Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored. To load target model JointTextEncoder Begin to load 1 model [Memory Management] Current Free GPU Memory: 16501.61 MB [Memory Management] Required Model Memory: 5227.11 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 10250.51 MB LoRA patching has taken 1.44 seconds Moving model(s) has taken 1.45 seconds Distilled CFG Scale: 3.5 To load target model KModel Begin to load 1 model Reuse 1 loaded models [Memory Management] Current Free GPU Memory: 3351.32 MB [Memory Management] Required Model Memory: 0.00 MB [Memory Management] Required Inference Memory: 1024.00 MB [Memory Management] Estimated Remaining GPU Memory: 2327.32 MB Patching LoRAs: 100%|████████████████████████████████████████████████████████████████| 134/134 [00:09<00:00, 14.49it/s] LoRA patching has taken 9.86 seconds Moving model(s) has taken 9.87 seconds 14%|███████████▊ | 5/35 [02:28<15:22, 30.74s/it] Total progress: 6%|███▎ | 110/1960 [08:20<14:06:25, 27.45s/it]

bedovyy commented 3 months ago

I guess you are running out VRAM so it use shared GPU memory. could you try decreasing 'GPU weights' on the top, or disable GPU memory following the below guide? https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/system-memory-fallback-for-stable-diffusion

lllyasviel / stable-diffusion-webui-forge

Generation time with lora increased 10 times (tested on schnell nf4) #1132