Multiple unnecessary [Unload] before generation (console logs attached)

SDXL. I've completely fresh installed to isolated the issue. No extensions or changes to settings. Please see my console image attached.

Problem: Before image generation, forge is unloading *something, attempting to get more VRAM, but there is plenty of VRAM available. Why is this happening? The issue gets dozens of times worse when I am using extensions like dynamic prompt / neg pip, which seems to multiply the total number of request. (see second image).

Test: In the first image attached from fresh install, you will see i've run a simple prompt with two lora to generate a batch of 4 images. I run this same prompt twice in "diffusion bits" = automatic, and then twice in "diffusion bits" = automaticfp16 just to test.

Expectations:

After first generation of any prompt, the second generation of the same prompt should not require any loading. Generation should start immediately.
Models should never be [Unload]ed if there is memory available

Reality:

Each generation requires loading of models
Adding extensions leads to recursive memory [Unload]ing

Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16240.64 MB ... Done.███████████████████████████| 20/20 [00:14<00:00, 1.71it/s] [Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16240.03 MB ... Done. [Unload] Trying to free 3072.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16238.83 MB ... Done. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:11<00:00, 1.68it/s] [Unload] Trying to free 2552.34 MB for cuda:0 with 1 models keep loaded ... Current free memory is 16230.45 MB ... Done.███████████████████████████| 20/20 [00:11<00:00, 1.70it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00, 1.41it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00, 1.70it/s]

Screenshot 2024-09-06 073953

Screenshot 2024-09-06 075328

Just to speed up RCA, attaching screenshot running from my existing (not fresh) environment, where i have set >20 lora in memory, and reduced GPU weight < 16gb vram. This loading should not be happening at all. I'm confused why this is happening. All test are on the latest forge release, and occur on releases back over a week ago.

Screenshot 2024-09-06 075955

lllyasviel / stable-diffusion-webui-forge

Multiple unnecessary [Unload] before generation (console logs attached) #1720