[Bug]: Forge keeps using more and more VRAM with every generation

Postmoderncaliban commented 6 months ago

Checklist

[X] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

With every subsequent generation the VRAM used by forge increases until it nears the max vram of my card. At this point the excessive vram usage will cause screen flickering or a blackscreen. Restarting the application frequently prevents the issue from arising.

Steps to reproduce the problem

Generate multiple images (about 3 in 1024*1024 sdxl)
Drivers cause screenflickering/blackscreen because they don't have enough vram to work properly?

What should have happened?

WebUI should use the same amount of VRAM on each generation

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

sysinfo-2024-02-20-21-45.json

Console logs

Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
Version: f0.0.14v1.8.0rc-latest-184-g43c9e3b5
Commit hash: 43c9e3b5ce1642073c7a9684e36b45489eeb4a49
Legacy Preprocessor init warning: Unable to install insightface automatically. Please try run `pip install insightface` manually.
Launching Web UI with arguments: --listen --enable-insecure-extension-access --theme dark
Total VRAM 8176 MB, total RAM 15834 MB
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6600M : native
VAE dtype: torch.float32
2024-02-20 22:30:54.971808: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 22:30:55.079085: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-20 22:30:55.079167: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-20 22:30:55.096649: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-20 22:30:55.138842: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2024-02-20 22:30:55.139490: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-02-20 22:30:55.990032: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
ControlNet preprocessor location: /home/user/stable-diffusion-webui-forge/models/ControlNetPreprocessor
Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu.
[-] ADetailer initialized. version: 24.1.2, num models: 9
Loading weights [821aa5537f] from /home/user/stable-diffusion-webui-forge/models/Stable-diffusion/autismmixSDXL_autismmixPony.safetensors
2024-02-20 22:31:01,670 - ControlNet - INFO - ControlNet UI callback registered.
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.4s (prepare environment: 2.7s, import torch: 4.0s, import gradio: 0.9s, setup paths: 3.2s, other imports: 0.4s, load scripts: 3.2s, create ui: 0.8s, gradio launch: 0.2s).
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
To load target model SDXLClipModel
Begin to load 1 model
Moving model(s) has taken 0.63 seconds
Model loaded in 8.5s (load weights from disk: 1.2s, forge load real models: 5.6s, calculate empty prompt: 1.7s).
WARNING:  Invalid HTTP request received.
To load target model SDXLClipModel
Begin to load 1 model
unload clone 0
Moving model(s) has taken 0.71 seconds
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 9.94 seconds
100%|███████████████████████████████████████████| 20/20 [00:36<00:00,  1.80s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:33<00:00,  1.75s/it]
Begin to load 1 model
Moving model(s) has taken 0.44 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:40<00:00,  2.02s/it]
To load target model SDXLClipModel██████████████| 20/20 [00:40<00:00,  1.75s/it]
Begin to load 1 model
Moving model(s) has taken 0.30 seconds
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 1.92 seconds
100%|███████████████████████████████████████████| 20/20 [00:34<00:00,  1.74s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:33<00:00,  1.74s/it]
Begin to load 1 model
Moving model(s) has taken 0.49 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:41<00:00,  2.08s/it]
To load target model SDXL███████████████████████| 20/20 [00:41<00:00,  1.74s/it]
Begin to load 1 model
Moving model(s) has taken 1.35 seconds
100%|███████████████████████████████████████████| 20/20 [00:37<00:00,  1.87s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:35<00:00,  1.84s/it]
Begin to load 1 model
Moving model(s) has taken 0.44 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:42<00:00,  2.14s/it]
To load target model SDXL███████████████████████| 20/20 [00:42<00:00,  1.84s/it]
Begin to load 1 model
Moving model(s) has taken 1.36 seconds
100%|███████████████████████████████████████████| 20/20 [00:37<00:00,  1.87s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:35<00:00,  1.89s/it]
Begin to load 1 model
Moving model(s) has taken 0.56 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:42<00:00,  2.12s/it]
To load target model SDXL███████████████████████| 20/20 [00:42<00:00,  1.89s/it]
Begin to load 1 model
Moving model(s) has taken 1.41 seconds
100%|███████████████████████████████████████████| 20/20 [00:39<00:00,  2.00s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:37<00:00,  1.96s/it]
Begin to load 1 model
Moving model(s) has taken 0.47 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:44<00:00,  2.24s/it]
To load target model SDXL███████████████████████| 20/20 [00:44<00:00,  1.96s/it]
Begin to load 1 model
Moving model(s) has taken 1.44 seconds
100%|███████████████████████████████████████████| 20/20 [00:36<00:00,  1.83s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:34<00:00,  1.83s/it]
Begin to load 1 model
Moving model(s) has taken 0.44 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:41<00:00,  2.08s/it]
To load target model SDXLClipModel██████████████| 20/20 [00:41<00:00,  1.83s/it]
Begin to load 1 model
Moving model(s) has taken 0.31 seconds
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 1.77 seconds
100%|███████████████████████████████████████████| 20/20 [00:44<00:00,  2.21s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:42<00:00,  2.29s/it]
Begin to load 1 model
Moving model(s) has taken 0.75 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:49<00:00,  2.49s/it]
To load target model SDXLClipModel██████████████| 20/20 [00:49<00:00,  2.29s/it]
Begin to load 1 model
Moving model(s) has taken 0.48 seconds
To load target model SDXL
Begin to load 1 model
Moving model(s) has taken 1.69 seconds
100%|███████████████████████████████████████████| 20/20 [00:38<00:00,  1.90s/it]
To load target model AutoencoderKL██████████████| 20/20 [00:35<00:00,  1.82s/it]
Begin to load 1 model
Moving model(s) has taken 0.39 seconds
Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding.
Total progress: 100%|███████████████████████████| 20/20 [00:42<00:00,  2.13s/it]
Total progress: 100%|███████████████████████████| 20/20 [00:42<00:00,  1.82s/it]

Additional information

No response

Eminic commented 6 months ago

Can confirm I am having the exact same issue as stated in this post. Running an rx 6700 xt myself, and on the third or so generation using an SDXL model my VRAM usage nears the maximum amount and I experience screen flickering/artifacting. I've seen this happening both in txt2img and img2img (upscaling). Tried everything from different optimizers to messing with my gpu frequencies but the issue persists.

Postmoderncaliban commented 6 months ago

I've updated the repo to get the new memory management stuff and while this fixed the initial issue, it introduced 2 new one's. First, generation is now much slower it dropped from 2.7s/it for a 968*1272 to over 4s/it using euler a. Secondly, forge now keeps the gpu constantly at full activity which increases temperatures, while idle, from around 40C to 55C. I'll add a log of me generating 3 images for good measure, since the numbers that the memory manager produces seem odd: Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] Version: f0.0.15v1.8.0rc-latest-234-g79bdb786 Commit hash: 79bdb7861914c2dac7141d02a87784b5a7168fef Legacy Preprocessor init warning: Unable to install insightface automatically. Please try runpip install insightface` manually. Launching Web UI with arguments: --listen --enable-insecure-extension-access --theme dark Total VRAM 8176 MB, total RAM 15834 MB Set vram state to: NORMAL_VRAM Device: cuda:0 AMD Radeon RX 6600M : native VAE dtype: torch.float32 2024-02-24 14:20:32.988098: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-24 14:20:33.100690: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-24 14:20:33.100789: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-24 14:20:33.119657: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-24 14:20:33.166044: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used. 2024-02-24 14:20:33.166696: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-24 14:20:34.018657: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split ControlNet preprocessor location: /home/user/stable-diffusion-webui-forge/models/ControlNetPreprocessor Tag Autocomplete: Could not locate model-keyword extension, Lora trigger word completion will be limited to those added through the extra networks menu. [-] ADetailer initialized. version: 24.1.2, num models: 9 sd-webui-prompt-all-in-one background API service started successfully. Loading weights [821aa5537f] from /home/user/stable-diffusion-webui-forge/models/Stable-diffusion/autismmixSDXL_autismmixPony.safetensors 2024-02-24 14:20:39,840 - ControlNet - INFO - ControlNet UI callback registered. Running on local URL: http://0.0.0.0:7860 model_type EPS UNet ADM Dimension 2816 Using split attention in VAE Working with z of shape (1, 4, 32, 32) = 4096 dimensions. Using split attention in VAE

To create a public link, set share=True in launch(). Startup time: 19.1s (prepare environment: 2.8s, import torch: 3.7s, import gradio: 0.8s, setup paths: 3.2s, other imports: 0.5s, load scripts: 3.4s, create ui: 0.8s, gradio launch: 2.3s, app_started_callback: 1.5s). extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'} To load target model SDXLClipModel Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7981.99609375 [Memory Management] Model Memory (MB) = 2144.3546981811523 [Memory Management] Estimated Inference Memory (MB) = 1024.0 [Memory Management] Estimated Remaining Memory (MB) = 4813.641395568848 Moving model(s) has taken 0.58 seconds Model loaded in 9.0s (load weights from disk: 1.2s, forge load real models: 6.1s, calculate empty prompt: 1.7s). To load target model SDXLClipModel Begin to load 1 model Reuse 1 loaded models [Memory Management] Current Free Memory (MB) = 5773.76953125 [Memory Management] Model Memory (MB) = 0.0 [Memory Management] Estimated Inference Memory (MB) = 1024.0 [Memory Management] Estimated Remaining Memory (MB) = 4749.76953125 Moving model(s) has taken 0.01 seconds To load target model SDXL Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7608.66259765625 [Memory Management] Model Memory (MB) = 4897.086494445801 [Memory Management] Estimated Inference Memory (MB) = 26676.0 [Memory Management] Estimated Remaining Memory (MB) = -23964.42389678955 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 4897.0483474731445 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 10.26 seconds 100%|███████████████████████████████████████████| 25/25 [01:42<00:00, 4.10s/it] To load target model AutoencoderKL██████████████| 25/25 [01:35<00:00, 3.96s/it] Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7787.27978515625 [Memory Management] Model Memory (MB) = 319.11416244506836 [Memory Management] Estimated Inference Memory (MB) = 10230.11279296875 [Memory Management] Estimated Remaining Memory (MB) = -2761.9471702575684 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 319.11416244506836 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 0.10 seconds Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding. Total progress: 100%|███████████████████████████| 25/25 [01:59<00:00, 4.80s/it] To load target model SDXL███████████████████████| 25/25 [01:59<00:00, 3.96s/it] Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7765.5517578125 [Memory Management] Model Memory (MB) = 4897.086494445801 [Memory Management] Estimated Inference Memory (MB) = 26676.0 [Memory Management] Estimated Remaining Memory (MB) = -23807.5347366333 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 4897.0483474731445 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 55.23 seconds 100%|███████████████████████████████████████████| 25/25 [01:55<00:00, 4.63s/it] To load target model AutoencoderKL██████████████| 25/25 [01:43<00:00, 4.21s/it] Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7764.9775390625 [Memory Management] Model Memory (MB) = 319.11416244506836 [Memory Management] Estimated Inference Memory (MB) = 10230.11279296875 [Memory Management] Estimated Remaining Memory (MB) = -2784.2494163513184 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 319.11416244506836 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 0.17 seconds Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding. Total progress: 100%|███████████████████████████| 25/25 [02:11<00:00, 5.27s/it] To load target model SDXL███████████████████████| 25/25 [02:11<00:00, 4.21s/it] Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7754.30419921875 [Memory Management] Model Memory (MB) = 4897.086494445801 [Memory Management] Estimated Inference Memory (MB) = 26676.0 [Memory Management] Estimated Remaining Memory (MB) = -23818.78229522705 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 4897.0483474731445 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 36.10 seconds 100%|███████████████████████████████████████████| 25/25 [01:45<00:00, 4.20s/it] To load target model AutoencoderKL██████████████| 25/25 [01:40<00:00, 4.10s/it] Begin to load 1 model Reuse 0 loaded models [Memory Management] Current Free Memory (MB) = 7764.67529296875 [Memory Management] Model Memory (MB) = 319.11416244506836 [Memory Management] Estimated Inference Memory (MB) = 10230.11279296875 [Memory Management] Estimated Remaining Memory (MB) = -2784.5516624450684 [Memory Management] Requested Async Preserved Memory (MB) = 0.0 [Async Memory Management] Parameters Loaded to Async Stream (MB) = 319.11416244506836 [Async Memory Management] Parameters Loaded to GPU (MB) = 0.0 Moving model(s) has taken 0.18 seconds Warning: Ran out of memory when regular VAE decoding, retrying with tiled VAE decoding. Total progress: 100%|███████████████████████████| 25/25 [02:06<00:00, 5.07s/it] Total progress: 100%|███████████████████████████| 25/25 [02:06<00:00, 4.10s/it]`

Postmoderncaliban commented 6 months ago

Screenshot from 2024-02-24 16-53-16 I'll also add that SD now eats up my entire swap file, that was hovering at around 1.7 gigs before.

Postmoderncaliban commented 6 months ago

With the new commit it's up to the old generation speed again, might even be a little faster now, but the screen flickering issue is also back and seemingly worsened. Let me know if you need new console logs or a new sysinfo.

Eminic commented 6 months ago

With the new commit it's up to the old generation speed again, might even be a little faster now, but the screen flickering issue is also back and seemingly worsened. Let me know if you need new console logs or a new sysinfo.

Confirming this, just tested out the new commit and it's exactly as he said: somewhat faster than the old one, not that that matters though because the memory issue is actually worse than before. Can't even finish the second image and it's already flickering.

Postmoderncaliban commented 6 months ago

Newest commit seemingly fixed the issue for me, even without the U-net part of the new extension enabled. Vram usage stays at a constant 7gb and the gpu goes into idle just fine, when it's not generating. I'll test it some more before closing the issue.

Issue is still there, just takes much longer to appear. It started to flicker after 10+ images here, with "never oom" flag for unet enabled: https://pastebin.com/cqCcEd4k and with the flag disabled, it takes 2-4 images before there's an oom error: https://pastebin.com/ZFmc6D01. No flickering with the unet flag disabled, just aborts the current generation and terminates itself. I also can't make use of the larger image generation that "never oom" enables. Trying to generate a large image, say upscaled to 2048*2048, results in the system shutting down completely.

Eminic commented 6 months ago

Issue seems to be solved for me with the new commit, however it is atrociously slow with both unet and vae neverOOM turned on. Also my RAM keeps filling to the max and leaking into swap, but that may just be me since i've only got 16 gigs and 10 gigs of swap. With that said, i haven't seen my VRAM go above 7-8 gb.

Further testing and it looks like i found a scenario where the issue reappears. If I use both unet and vae neverOOM I can keep generating seemingly forever albeit quite slowly, with only unet neverOOM turned on memory rises quite substantially however it's not to the point where the drivers start breaking. It is also faster this way. With only the VAE neverOOM on, the issue returns almost instantly.

vamzii commented 6 months ago

for me the memory fills up, and my linux system hangs, i have to force restart my pc and this happens very often

Postmoderncaliban commented 6 months ago

for me the memory fills up, and my linux system hangs, i have to force restart my pc and this happens very often

That happens for me when I try to generate at a very large resolution. Ubuntu will shut down and the oom killer will be active in the shutdown screen. I've had one massive freeze after generating with the neveroom unet flag for a while, but forge terminated itself after a while and I didn't need to restart.

ssbugman commented 3 months ago

Memory leak still exist.

lllyasviel / stable-diffusion-webui-forge