Closed zethfoxster closed 10 months ago
just to rule things out, does the vram leak occur with:
no second pass:
2023-09-26T23:20:36.624Z | INFO | sd | processing | Processed: images=1 time=1.12s its=17.86 memory={'ram': {'used': 3.35, 'total': 127.75}, 'gpu': {'used': 4.91, 'total': 23.99}, 'retries': 0, 'oom': 0}
-- | -- | -- | -- | --
2023-09-26T23:20:52.953Z | DEBUG | sd | txt2img | txt2img: id_task=task(adqpzepyw7wk3wu)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=560433544.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=False\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=ESRGAN_4x\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:20:52.989Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:20:54.081Z | INFO | sd | processing | Processed: images=1 time=1.12s its=17.78 memory={'ram': {'used': 3.35, 'total': 127.75}, 'gpu': {'used': 4.91, 'total': 23.99}, 'retries': 0, 'oom': 0}
observed steady 4.91 vram usage....
second pass latent upscale only:
2023-09-26T23:23:39.990Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:23:41.037Z | DEBUG | sd | processing | Init hires: upscaler=Latent sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:23:41.039Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:23:46.040Z | INFO | sd | processing | Processed: images=1 time=6.08s its=3.29 memory={'ram': {'used': 3.43, 'total': 127.75}, 'gpu': {'used': 15.21, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:23:46.773Z | DEBUG | sd | txt2img | txt2img: id_task=task(kxvu0j7e4jvwnz8)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=-1.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=True\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=Latent\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:23:46.811Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:23:47.653Z | DEBUG | sd | processing | Init hires: upscaler=Latent sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:23:47.654Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:23:52.666Z | INFO | sd | processing | Processed: images=1 time=5.89s its=3.40 memory={'ram': {'used': 3.43, 'total': 127.75}, 'gpu': {'used': 15.21, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:23:57.225Z | DEBUG | sd | txt2img | txt2img: id_task=task(1rvknjcvzqg8i2r)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=-1.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=True\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=Latent\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:23:57.261Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:23:58.289Z | DEBUG | sd | processing | Init hires: upscaler=Latent sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:23:58.290Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:24:03.302Z | INFO | sd | processing | Processed: images=1 time=6.07s its=3.29 memory={'ram': {'used': 3.43, 'total': 127.75}, 'gpu': {'used': 15.21, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:26:00.334Z | DEBUG | sd | launch | Server alive=True jobs=15 requests=276 uptime=369s memory used=3.39 total=127.75 idle
observed massive jump to 15.21g vram usage, but steady....this is the first sign of something screwy happening, as the massive jump in vram persisted even after doing a subsequent no second pass run, meaning the vram is sitting there at 15gigs from this point on, but its safe to assume second pass loads 15gigs of something into vram.
second pass with esrgan x4 upscale:
2023-09-26T23:30:51.478Z | DEBUG | sd | txt2img | txt2img: id_task=task(jqo6pnrmb507yh2)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=-1.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=True\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=ESRGAN_4x\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:30:51.515Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:30:52.620Z | DEBUG | sd | processing | Init hires: upscaler=ESRGAN_4x sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:30:53.920Z | INFO | sd | processing | Processed: images=1 time=2.44s its=8.21 memory={'ram': {'used': 3.44, 'total': 127.75}, 'gpu': {'used': 16.2, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:30:58.424Z | DEBUG | sd | txt2img | txt2img: id_task=task(ese3yscc0zmk6xr)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=-1.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=True\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=ESRGAN_4x\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:30:58.444Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:30:59.471Z | DEBUG | sd | processing | Init hires: upscaler=ESRGAN_4x sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:31:00.812Z | INFO | sd | processing | Processed: images=1 time=2.38s its=8.39 memory={'ram': {'used': 3.44, 'total': 127.75}, 'gpu': {'used': 16.7, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:31:02.911Z | DEBUG | sd | txt2img | txt2img: id_task=task(zvhni3pppb9jdw2)\|prompt=patato baked on a tray\|negative_prompt=\|prompt_styles=[]\|steps=20\|sampler_index=1\|latent_index=1\|full_quality=True\|restore_faces=False\|tiling=False\|n_iter=1\|batch_size=1\|cfg_scale=6\|clip_skip=1\|seed=-1.0\|subseed=-1.0\|subseed_strength=0\|seed_resize_from_h=0\|seed_resize_from_w=0\|\|height=816\|width=512\|enable_hr=True\|denoising_strength=0.5\|hr_scale=2\|hr_upscaler=ESRGAN_4x\|hr_force=False\|hr_second_pass_steps=20\|hr_resize_x=0\|hr_resize_y=0\|image_cfg_scale=6\|diffusers_guidance_rescale=0.7\|refiner_steps=5\|refiner_start=0.8\|refiner_prompt=\|refiner_negative=\|override_settings_texts=[]
2023-09-26T23:31:02.929Z | DEBUG | sd | sd_samplers | Sampler: sampler=DDIM config={'default_eta_is_0': True, 'uses_ensd': True}
2023-09-26T23:31:03.950Z | DEBUG | sd | processing | Init hires: upscaler=ESRGAN_4x sampler=DDIM resize=0x0 upscale=1024x1632
2023-09-26T23:31:05.268Z | INFO | sd | processing | Processed: images=1 time=2.36s its=8.49 memory={'ram': {'used': 3.44, 'total': 127.75}, 'gpu': {'used': 17.2, 'total': 23.99}, 'retries': 0, 'oom': 0}
2023-09-26T23:31:59.634Z | DEBUG | sd | launch | Server alive=True jobs=18 requests=347 uptime=728s memory used=3.4 total=127.75 idle
observed a climb in vram usage that increased per gen. possibly the upscaler being reloaded and left there in vram every run.
HiRes Fix: observed no difference in the increases seen without this option checked...it is a steady increase of about 500mbs a gen. sidenote: Hires fixed enabled on latent upscalers(not sure this does anything) showed no increase at all. leading me to assume the leak is in the nonlatent upscalers(esrgan, superyandere, swinir, ect)
conclusions I draw from this if it matters : theres a massive jump in vram usage from simply turning on second pass.
second pass with a nonlatent upscaler exhabits signs of a (500mb x batch size) memory leak. it appears to be multiplied by whatever the batch size is.
HiRes pass had no effect on vram usage.
The First image you generate does not exhabit the massive leaps in vram usage, it is only subsequent gens.
Continuing to gen until you hit the max vram(24gig for me) that image will take a slightly longer time then it appears to have dumped whatever was in the vram and started back over, slowly climbing again.
I hope this is helpful in tracking this down.
Yes, this is exactly the info needed. I'll update when I have a fix.
Hunch was spot on, non-latent upscalers were simply force-loading model file over and over. Sometimes torch manages to reuse vram, but quite often it does not.
Anyhow...
I've implemented model cache and unload for swinir/scunet/esrgan/realesrgan
on first use:
08:46:30-850923 INFO Upscaler loaded: type=SCUNet model=models/SCUNet/scunet_color_real_gan.pth
on subsequent use:
08:47:01-715385 DEBUG Upscaler cached: type=SwinIR model=models/SCUNet/scunet_color_real_gan.pth
if unload is selected as settings->upscalers (new setting):
08:46:32-841720 DEBUG Upscaler unloaded: type=SCUNet model=models/SCUNet/scunet_color_real_gan.pth
Its in dev branch, will be merged to master next week.
while the above we discussed addresses the memleak, I do think i should refocus your attention on the other issue seen here. even with latent upscaler, simply turning on 2nd pass spikes vram usage to insane amounts. theres no reason a single image 512x832 x2upscale should load 20gigs of something to complete this image...(who knows maybe the model a bunch of times? maybe every model in a given folder?)
I am very aware of the driver issue with nvidia, but im sure whatever is happening here isnt helping the issue. does your above fix address this spike as well?
if not, how else can i assist you in tracking this down, I much prefer to use sdnext instead of auto1111...currently im hovering between the 2 webuis because in auto1111 this issue doesnt exist...but base auto1111 feels prehistoric... help me to help you vlad... in current sdnext I cant gen more than 4 images without triggering the dreaded nvidia issue...this is a sharp contrast to the 12 parallel images I can get in auto1111 without even coming close to triggering it.
NVIDIA GeForce RTX 4090
Driver version: 31.0.15.3734
Driver date: 9/1/2023
DirectX version: 12 (FL 12.1)
Physical location: PCI bus 1, device 0, function 0
Utilization 98%
Dedicated GPU memory 20.5/24.0 GB
Shared GPU memory 0.1/63.9 GB
GPU Memory 20.6/87.9 GB
just tried 512x832 upscale 2x with hires.
system info tab shows: gpu-allocated: current:2.05 peak:7.16
(this is best source of information)
and windows confirms it:
crossing my fingers that whatever it is for me is currently fixed in Dev branch than...this is subsequent correct? the first gen is uneffected by whatever this is.
Issue Description
as per discussion section, first image gens fine and subsequent gens show a massive spike in vram usage that gradually increases. this was recorded from a fresh clone of the project, no modifications to settings, using base model that was downloaded (sd 1.5), I simply just hit generate after each image completed making no changes to any of the settings. if there is anything else you need from me let me know.
https://github.com/vladmandic/automatic/discussions/2250 for reference linked discussion.
Version Platform Description
current system specs windows 11, rtx 4090, i13 chrome browser
Backend
Original
Model
SD 1.5
Acknowledgements