CUDA runs out of memory after generating a bunch of depth map images

plaidpants commented 1 year ago

It's either fragmenting the VRAM or not de-allocating the model when finished, it does not seem to reuse the already loaded depth-map model. I can use the dream-booth extension to clear all the memory and gets things started again, maybe you could do something like that. It happens on both img2img and txt2img. Computing depthmap(s) .. Total progress: 100%|████████████████████████████| 1/1 [00:09<00:00, 9.13s/it] INFO:dynamic_prompting.py:Prompt matrix will create 1 images in a total of 1 bat ches. 100%|████████████████████████████████████████████| 1/1 [00:14<00:00, 14.40s/it] Total progress: 0%| | 0/1 [00:00<?, ?it/s] DepthMap v0.1.8 device: cuda Loading midas model weights .. ./models/midas/dpt_large-midas-2f21e586.pt Computing depthmap(s) .. Total progress: 100%|████████████████████████████| 1/1 [00:08<00:00, 8.93s/it] INFO:dynamic_prompting.py:Prompt matrix will create 1 images in a total of 1 bat ches. Error completing request Arguments: (0, '', '', 'None', 'None', <PIL.Image.Image image mode=RGB size=512x 640 at 0x264A3795CF0>, None, None, None, 0, 20, 0, 4, 1, False, False, 1, 1, 7, 0, -1.0, -1.0, 0, 0, 0, False, 512, 512, 0, False, 32, 0, '', '', 1, '<div class ="dynamic-prompting">\n <h3>Combinations</h3>\n\n Choose a number of terms from a list, in this case we choose two artists: \n <code c lass="codeblock">{2$$artist1|artist2|artist3}</code> \n\n If $$ is not pr ovided, then 1$$ is assumed. \n\n If the chosen number of terms is g reater than the available terms, then some terms will be duplicated, otherwise c hosen terms will be unique. This is useful in the case of wildcards, e.g.\n < code class="codeblock">{2$$__artist__}</code> is equivalent to <code class="code block">{2$$__artist__|__artist__}</code> \n\n A range can be provide d:\n <code class="codeblock">{1-3$$artist1|artist2|artist3}</code> \n In this case, a random number of artists between 1 and 3 is chosen. \n\ n Wildcards can be used and the joiner can also be specified:\n <code clas s="codeblock">{{1-$$and$$__adjective__}}</code> \n\n Here, a random numbe r between 1 and 3 words from adjective.txt will be chosen and joined together wi th the word \'and\' instead of the default comma.\n\n \n\n <h3>Wildcards</h3>\n \n\n \n If the groups wont drop d own click here</s trong> to fix the issue.\n\n \n\n <code class="codeblock">WILDCA RD_DIR: T:\\stable-diffusion-webui4\\extensions\\sd-dynamic-prompts\\wildcards</ code> \n You can add more wildcards by creating a text file with one term per line and name is mywildcards.txt. Plac e it in T:\\stable-diffusion-webui4\\extensions\\sd-dynamic-prompts\\wildcards. <code class="codeblock">__<folder>/mywildcards__</code> will then become available.\n</div>\n\n', True, False, 1, False, False, False, 100, 0.7, False, False, False, False, False, False, 0.9, 5, '0.0001', False, 'None', '', 0.1, False, 0, 0, 384, 384, False, False, True, True, True, 1, '<ul>\n<li><code> CFG Scale</code> should be 2 or lower.</li>\n</ul>\n', True, True, '', '', True, 50, True, 1, 0, False, 4, 1, 'Recommended setti ngs: Sampling Steps: 80-100, Sampler: Euler a, Denoising strength: 0.8', 128 , 8, ['left', 'right', 'up', 'down'], 1, 0.05, 128, 4, 0, ['left', 'right', 'up' , 'down'], False, False, False, '', 'Will upscal e the image to twice the dimensions; use width and height sliders to set tile si ze', 64, 0, 1, '', 0, '', True, False, False, 'Deforum v0.5-webui-beta', 'This script is deprecated. Please use the full Deforum extension instead. \nUpdate instructions:', 'github.com/deforum-art/deforum-for-automatic1111-webui/blob/automatic1111-we bui/README.md', 'discord.gg/deforum', '{inspiration}', None) {} Traceback (most recent call last): File "T:\stable-diffusion-webui4\modules\ui.py", line 185, in f res = list(func(*args, **kwargs)) File "T:\stable-diffusion-webui4\webui.py", line 57, in f res = func(*args, **kwargs) File "T:\stable-diffusion-webui4\modules\img2img.py", line 137, in img2img processed = modules.scripts.scripts_img2img.run(p, *args) File "T:\stable-diffusion-webui4\modules\scripts.py", line 317, in run processed = script.run(p, *script_args) File "T:\stable-diffusion-webui4\scripts\depthmap.py", line 63, in run processed = processing.process_images(p) File "T:\stable-diffusion-webui4\modules\processing.py", line 430, in process_ images res = process_images_inner(p) File "T:\stable-diffusion-webui4\modules\processing.py", line 496, in process_ images_inner p.init(p.all_prompts, p.all_seeds, p.all_subseeds) File "T:\stable-diffusion-webui4\modules\processing.py", line 841, in init self.init_latent = self.sd_model.get_first_stage_encoding(self.sd_model.enco de_first_stage(image)) File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\autograd\grad_mo de.py", line 27, in decorate_context return func(*args, **kwargs) File "T:\stable-diffusion-webui4\repositories\stable-diffusion\ldm\models\diff usion\ddpm.py", line 863, in encode_first_stage return self.first_stage_model.encode(x) File "T:\stable-diffusion-webui4\repositories\stable-diffusion\ldm\models\auto encoder.py", line 325, in encode h = self.encoder(x) File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\nn\modules\modul e.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "T:\stable-diffusion-webui4\repositories\stable-diffusion\ldm\modules\dif fusionmodules\model.py", line 442, in forward h = self.down[i_level].block[i_block](hs[-1], temb) File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\nn\modules\modul e.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "T:\stable-diffusion-webui4\repositories\stable-diffusion\ldm\modules\dif fusionmodules\model.py", line 130, in forward h = self.norm2(h) File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\nn\modules\modul e.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\nn\modules\norma lization.py", line 272, in forward return F.group_norm( File "T:\stable-diffusion-webui4\venv\lib\site-packages\torch\nn\functional.py ", line 2516, in group_norm return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends .cudnn.enabled) RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 8.00 GiB total capacity; 6.87 GiB already allocated; 0 bytes free; 7.16 GiB reserved in t otal by PyTorch) If reserved memory is >> allocated memory try setting max_split _size_mb to avoid fragmentation. See documentation for Memory Management and PY TORCH_CUDA_ALLOC_CONF

thygate commented 1 year ago

I have noticed that I get out of memory errors after a few thousand images when running the midas repo on its own ..

after explicitly freeing the model, running system and torch garbage collection manually, adding better exception handling and a lot of testing, It seems to be solved now in v0.1.9

It is now able to recover after an out of memory error.

I also unload the sd model now before loading midas to free up a bit more memory for the generation.

plaidpants commented 1 year ago

FYI, it produces some very cool results on my looking glass portrait. I will update and try it out shortly.

https://photos.app.goo.gl/FEUYbAwD1c4a8gCC9

thygate commented 1 year ago

Those are some very nice images and depthmaps on that lookingglass portrait ! :heart:

Are you using HoloPlay Studio to display the image with depthmap ?

plaidpants commented 1 year ago

Yes, I am using LookingGlassStudio to display and add them to my looking glass portrait for stand alone display. I have updated the script and been using it for a while and have not had any additional out of memory issues.

thygate commented 1 year ago

In my tests it was now able to fully recover after a cuda out of memory runtime error, being able to generate the same max size images again as before. Also unloading the sd model allowed me to generate slightly larger depthmaps than before. Being able to recover is the most important improvement, it should be ok now.

The Studio software does still generate the best renders/quilts from rgb and depth images. So far I've only been using them as displacements maps on a plane in Unity3D and threejs, like in the lookingglass viewer here. The Studio software does seem to do something similar, but the generated mesh is much cleaner.

Having to go through the Studio software to view each generated image is a bit of a hassle, so I'm still mainly using this method to view the results as they are generated using Unity3D with the same technique and a script that simply monitors the sd output path for new files. It's far from perfect but accomplished the goal of viewing in realtime as sd generates images.

If there any interest I might put some more effort in the viewers as there is a lot of room for improvement..

Thanks for sharing!

thygate commented 1 year ago

Closing the issue since there seem to be no more reports of this happening.

thygate / stable-diffusion-webui-depthmap-script

CUDA runs out of memory after generating a bunch of depth map images #29