Memory leak in batch .ply generation

enn-nafnlaus commented 1 year ago

I've noticed that when I run batch .ply generation on a series of 1920x1080 images (regardless of what model is chosen) - same resolution, boost-enabled, generate 3d inpainted mesh, generate 4 demo videos - in less than a day, it always crashes out (card: RTX 3090). I've started to notice, with periodic nvidia-smi calls, that over time the amount of memory allocated on the card continuously grows. Hence, it appears - though no CUDA error is printed to the console in the crash - that there is some sort of memory leak in the process.

The semi-annoying thing with the crashes and having to restart after them is that it has to regenerate all the depthmaps in the entire directory from scratch before it goes back to converting them to .ply files and rendering videos. Thankfully, that's the fast part!

enn-nafnlaus commented 1 year ago

Sample stacktrace:

Generating inpainted mesh .. (go make some coffee) .. Generating faces: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:30<00:00, 55.85s/it] Writing mesh file /path/to/stable-diffusion-webui/outputs/txt2img-images/2023-03-31/batch_out/00594-179867943-0014.ply ...█████████████████████████████████████████████████████████████████████████████████████| 7/7 [06:30<00:00, 40.85s/it] Saving faces: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14718891/14718891 [00:22<00:00, 654688.77it/s] Loading faces: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 14718891/14718891 [01:18<00:00, 187984.37it/s] Generating videos ..██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 14595707/14718891 [01:13<00:00, 892382.32it/s] fov: 53.13010235415598
29%|███████████████████████████████████████████████████████
All done. Error completing request Arguments: (2, None, None, '/path/to/stable-diffusion-webui/outputs/txt2img-images/2023-03-31/batch_in', '/path/to/stable-diffusion-webui/outputs/txt2img-images/2023-03-31/batch_out', 0, 3, 1920, 1080, True, False, True, True, True, False, False, 1, False, False, False, 2.5, 4, 0, False, 0, 1, True, True, 'u2net', False, False, False, 0, 2) {} Traceback (most recent call last): File "/path/to/stable-diffusion-webui/modules/call_queue.py", line 56, in f res = list(func(*args, *kwargs)) File "/path/to/stable-diffusion-webui/modules/call_queue.py", line 37, in f res = func(args, kwargs) File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/depthmap.py", line 1158, in run_generate outputs, mesh_fi = run_depthmap(None, outpath, imageArr, imageNameArr, compute_device, model_type, net_width, net_height, match_size, invert_depth, boost, save_depth, show_depth, show_heat, combine_output, combine_output_axis, gen_stereo, gen_stereotb, gen_anaglyph, stereo_divergence, stereo_fill, stereo_balance, clipdepth, clipthreshold_far, clipthreshold_near, inpaint, inpaint_vids, fnExt, vid_ssaa, background_removal, background_removed_images, save_background_removal_masks, False) File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/depthmap.py", line 559, in run_depthmap mesh_fi = run_3dphoto(device, inpaint_imgs, inpaint_depths, inputnames, outpath, fnExt, vid_ssaa, inpaint_vids) File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/depthmap.py", line 713, in run_3dphoto run_3dphoto_videos(mesh_fi, basename, outpath, 300, 40, File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/depthmap.py", line 795, in run_3dphoto_videos normal_canvas, all_canvas, fn_saved = output_3d_photo(verts.copy(), colors.copy(), faces.copy(), copy.deepcopy(Height), copy.deepcopy(Width), copy.deepcopy(hFov), copy.deepcopy(vFov), File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/inpaint/mesh.py", line 2330, in output_3d_photo normal_canvas = Canvas_view(fov, File "/path/to/stable-diffusion-webui/extensions/stable-diffusion-webui-depthmap-script/scripts/inpaint/mesh.py", line 2266, in init self.canvas = scene.SceneCanvas(bgcolor=bgcolor, size=(canvas_sizefactor, canvas_sizefactor)) File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/scene/canvas.py", line 135, in init super(SceneCanvas, self).init( File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/app/canvas.py", line 211, in init self.create_native() File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/app/canvas.py", line 228, in create_native self._app.backend_module.CanvasBackend(self, self._backend_kwargs) File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/app/backends/_egl.py", line 173, in init self._vispy_canvas.set_current() File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/app/canvas.py", line 412, in set_current self._backend._vispy_set_current() File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/app/backends/_egl.py", line 204, in _vispy_set_current egl.eglMakeCurrent(_EGL_DISPLAY, self._surface, self._surface, File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/vispy/ext/egl.py", line 355, in eglMakeCurrent raise RuntimeError('Could not make the context current.') RuntimeError: Could not make the context current.

Traceback (most recent call last): File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/gradio/routes.py", line 337, in run_predict output = await app.get_blocks().process_api( File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/gradio/blocks.py", line 1018, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "/path/to/stable-diffusion-webui/venv/lib64/python3.10/site-packages/gradio/blocks.py", line 935, in postprocess_data if predictions[i] is components._Keywords.FINISHED_ITERATING: IndexError: tuple index out of range

And "in less than a day" was being generous. I think it's actually more like 6-8 hours.

thygate commented 1 year ago

egl.eglMakeCurrent(_EGL_DISPLAY, self._surface, self._surface, RuntimeError: Could not make the context current.

It can't get an opengl context when trying to render a video. Possibly something somewhere is not releasing a gl context after using it ..

EDIT: It can't make a new context current/active.

I've checked the Canvas_View and output_3d_photo functions in /scripts/inpaint.mesh.py, where the problem happens, and can't immediately spot a problem, it seems in line with the vispy.canvas docs. These functions are still pretty much identical to the originals in the boostingmonoculardepth repo.

enn-nafnlaus commented 1 year ago

That may be, but I regularly run nvidia-smi over the course of the day, and the memory allocated on the card grows over the course of the run.

I just restarted, and it's still doing the Midas runs (hasn't started generating the ply files / videos yet), and here's the memory usage (card 0)

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 389269 C python3 5064MiB | | 1 N/A N/A 6051 C python 4828MiB | | 1 N/A N/A 356118 C python 5180MiB | +-----------------------------------------------------------------------------+

Here's some earlier nvidia-smi runs during the day, up to the last crash:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:17:00.0 Off | N/A | | 0% 34C P8 27W / 300W | 19955MiB / 24576MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:65:00.0 Off | N/A | | 64% 70C P2 107W / 150W | 10011MiB / 12288MiB | 100% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 318539 C+G python3 19926MiB | | 1 N/A N/A 6051 C python 4828MiB | | 1 N/A N/A 356118 C python 5180MiB | +-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 318539 C+G python3 12416MiB | | 1 N/A N/A 6051 C python 4828MiB | +-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 318539 C+G python3 18790MiB | | 1 N/A N/A 6051 C python 4828MiB | | 1 N/A N/A 356118 C python 1126MiB | +-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 318539 C+G python3 19939MiB | | 1 N/A N/A 6051 C python 4828MiB | | 1 N/A N/A 356118 C python 5180MiB | +-----------------------------------------------------------------------------+

I didn't run it again before the crash, but in the past I've seen it all the way up to using over 24 gigs. It's your classic memory leak situation, of steadily growing memory consumption over the course of the run until it finally crashes out.

Last night it crashed out particularly hard. I couldn't control-C out of the process. Killing the process neither freed the memory nor the port. After trying a wide range of different things, I ultimately had to reboot.

thygate commented 1 year ago

Try narrowing it down, does it also happen when only generating depth maps, or only when you run the 3dinpainting.

The code already tries hard to force garbage collection for main memory and gpu memory used by torch at the end of each run. Everything should be marked for GC when it goes out of scope.

Try adding these two lines to the loop running the inpainting for all the inputs, to force GC at the start of each new file. Add to line 651 in depthmap.py :

gc.collect()
devices.torch_gc()

so it becomes

        for count in trange(0, numimages):
                    gc.collect()
                    devices.torch_gc()

enn-nafnlaus commented 1 year ago

I doubt it has anything to do with depth map generation. It generates depth maps for every image in the input directory (which in my case started out with hundreds), and it does that comparably quickly and with no memory leaks. But once it finishes that it goes on to generating ply/mp4 files, and it only gets through generating maybe 5-10 ply/mp4 files before it runs out of memory and crashes.

I went ahead and added in those lines. Will let you know tomorrow how it fared.

enn-nafnlaus commented 1 year ago

Crashed.

Generating videos ..██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 15746192/15764768 [01:17<00:00, 1023930.88it/s] fov: 53.13010235415598 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 10/15 [2:57:39<1:28:49, 1065.97s/it] All done. Error completing request

... same stacktrace as before.

Last file generated was a .ply file at 5:24 AM. The file looks fine. No associated videos, though. It always crashes at that point, BTW.

thygate commented 1 year ago

I ran a batch all night, generating depth and all stereo formats, and inpainted meshes. No issue. Memory usage stable the whole time. (on a windows box).

enn-nafnlaus commented 1 year ago

I ran a batch all night, generating depth and all stereo formats, and inpainted meshes. No issue. Memory usage stable the whole time. (on a windows box).

Well, I don't know what to tell you. This is eminently repeatable - and more to the point, unavoidable - for me. And I don't experience it in regular img2img batch - in my previous project I ran batch img2img for weeks.

Lalimec commented 1 year ago

i have the same issue, after 3rd inpainted mesh my instance gets unresponsive. I use an ec2 linux instance with a10 gpu.

Idle webui uses around 8gb, however after using depthmap extension it trats to idle at 12.5gb. And unload models button doesnt work. With second inpainted mesh generation it goes up to 23gb, which is ridiculus.

thygate / stable-diffusion-webui-depthmap-script

Memory leak in batch .ply generation #156