lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.33k stars 811 forks source link

Multiple unnecessary [Unload] before generation (console logs attached) #1720

Open RJSprod opened 2 months ago

RJSprod commented 2 months ago

SDXL. I've completely fresh installed to isolated the issue. No extensions or changes to settings. Please see my console image attached.

Problem: Before image generation, forge is unloading *something, attempting to get more VRAM, but there is plenty of VRAM available. Why is this happening? The issue gets dozens of times worse when I am using extensions like dynamic prompt / neg pip, which seems to multiply the total number of request. (see second image).

Test: In the first image attached from fresh install, you will see i've run a simple prompt with two lora to generate a batch of 4 images. I run this same prompt twice in "diffusion bits" = automatic, and then twice in "diffusion bits" = automaticfp16 just to test.

Expectations:

Reality:

Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16240.64 MB ... Done.███████████████████████████| 20/20 [00:14<00:00, 1.71it/s] [Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16240.03 MB ... Done. [Unload] Trying to free 3072.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16238.83 MB ... Done. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.68it/s] [Unload] Trying to free 2552.34 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16230.45 MB ... Done.███████████████████████████| 20/20 [00:11<00:00, 1.70it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00, 1.41it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00, 1.70it/s]

Screenshot 2024-09-06 073953

Screenshot 2024-09-06 075328

Just to speed up RCA, attaching screenshot running from my existing (not fresh) environment, where i have set >20 lora in memory, and reduced GPU weight < 16gb vram. This loading should not be happening at all. I'm confused why this is happening. All test are on the latest forge release, and occur on releases back over a week ago.

Screenshot 2024-09-06 075955

krokofant commented 5 days ago

Disable the setting "Only keep one model on device". Seems to have worked for me.