thygate / stable-diffusion-webui-depthmap-script

High Resolution Depth Maps for Stable Diffusion WebUI
MIT License
1.7k stars 160 forks source link

Not able to generate a depthmap that is longer than 3 to 5 minutes [FEATURE REQUEST maybe??] #411

Open eyeEmotion opened 7 months ago

eyeEmotion commented 7 months ago

After testing which depthmap-model was suitable for my needs, where I want to generate depthmaps to convert (old) feature films to 3D, I suddenly had to discover that I can't process videos that are longer than around 3 to 5 minutes, even if the filesize is moderate.

With my 32Gb of Ram, I still get out of memory errors. So I'm assuming it first wants to extract every frame, before generating the depthmap frames. But this makes it it impossible to ever generate a depthmap video for larger videos. Isn't it better to have it:

and continue on, untill the entire video has been processed? Or render/process it like video editors do. They also have to deal with a lot of frames. Is use Davinci Resolve, and it is able to generate a depthmap and process it on the video to create stereoscopic 3D (SBS) and render the video. The reason I don't want to use Davinci Resolve's depthmap, because it doesn't really even create the general outline too well. Not like Midas atleast. Which makes some unwanted extrusions and prone too wobbly effects. It's fast, as it can create a deptmap in an instant. But you're stuck with the level of detail Davinci Resolve has set. No way to choose if you want to sacrifice some speed for more detail.

I already tried cutting the movie into pieces of 3 to 5 minutes. But it's not easy to cut off exactly where you left of. And with a film lasting 1h30 to 2 hours, that's a lot of work of cutting and rendering, only to again have to append all the parts of the processed depthmap video and have it exactly at the same frames as the movie.

I hope there is just something that I'm missing and this is already possible.

Cheers

Edit: Tried the 5 minute file again. During 'computing output', the Virtual Memory goes up to around 90Gb. Then it starts generating the deptmaps. During that process, I can see the Virtual Memory go up to 126Gb (still have plenty left on my SDD). But then I get these errors and everything falls down.


To create a public link, set share=True in launch(). Startup time: 54.5s (prepare environment: 16.8s, import torch: 9.6s, import gradio: 4.6s, setup paths: 7.9s, initialize shared: 1.3s, other imports: 4.4s, setup codeformer: 1.2s, setup gfpgan: 0.4s, list SD models: 0.1s, load scripts: 7.6s, create ui: 0.3s, gradio launch: 0.7s). Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml Applying attention optimization: Doggettx... done. Model loaded in 56.4s (load weights from disk: 39.0s, create model: 0.7s, apply weights to model: 1.3s, apply half(): 8.5s, load textual inversion embeddings: 0.1s, calculate empty prompt: 6.7s). Generating depthmaps for the video frames DepthMap v0.4.6 (500ee72a) device: cuda Loading model(s) .. Loading model weights from ./models/midas/dpt_beit_large_384.pt Computing output(s) .. 100%|██████████████████████████████████████████████████████████████████████████████| 7322/7322 [50:16<00:00, 2.43it/s] Computing output(s) done. All done.

Processing generated depthmaps Traceback (most recent call last): File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate ret = video_mode.gen_video( File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 150, in gen_video input_depths = process_predicitons(input_depths, smoothening) File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 126, in process_predicitons a, b = np.percentile(np.stack(processed), [0.5, 99.5]) File "<__array_function__ internals>", line 180, in percentile File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4166, in percentile return _quantile_unchecked( File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4424, in _quantile_unchecked r, k = _ureduce(a, File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 3725, in _ureduce r = func(a, **kwargs) File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4590, in _quantile_ureduce_func arr = a.flatten() numpy.core._exceptions._ArrayMemoryError: Unable to allocate 20.9 GiB for an array with shape (11246592000,) and data type float16

semjon00 commented 7 months ago

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

eyeEmotion commented 7 months ago

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

I understand. I'm just putting it out there.

In the meantime, I tried it again with the 5 minute video. This time I copied the errors I got. Don't know if they will be helpful for anybody.

During 'computing output', the Virtual Memory goes up to around 90Gb. Then it starts generating the deptmaps. During that process, I can see the Virtual Memory go up to 126Gb (still have plenty left on my SDD). But then I get these errors and everything falls down.


To create a public link, set share=True in launch(). Startup time: 54.5s (prepare environment: 16.8s, import torch: 9.6s, import gradio: 4.6s, setup paths: 7.9s, initialize shared: 1.3s, other imports: 4.4s, setup codeformer: 1.2s, setup gfpgan: 0.4s, list SD models: 0.1s, load scripts: 7.6s, create ui: 0.3s, gradio launch: 0.7s). Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml Applying attention optimization: Doggettx... done. Model loaded in 56.4s (load weights from disk: 39.0s, create model: 0.7s, apply weights to model: 1.3s, apply half(): 8.5s, load textual inversion embeddings: 0.1s, calculate empty prompt: 6.7s). Generating depthmaps for the video frames DepthMap v0.4.6 (https://github.com/thygate/stable-diffusion-webui-depthmap-script/commit/500ee72a2e2eeb664ffcd9ca57ee7979ae95c693) device: cuda Loading model(s) .. Loading model weights from ./models/midas/dpt_beit_large_384.pt Computing output(s) .. 100%|██████████████████████████████████████████████████████████████████████████████| 7322/7322 [50:16<00:00, 2.43it/s] Computing output(s) done. All done.

Processing generated depthmaps Traceback (most recent call last): File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate ret = video_mode.gen_video( File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 150, in gen_video input_depths = process_predicitons(input_depths, smoothening) File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 126, in process_predicitons a, b = np.percentile(np.stack(processed), [0.5, 99.5]) File "", line 180, in percentile File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4166, in percentile return _quantile_unchecked( File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4424, in _quantile_unchecked r, k = _ureduce(a, File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 3725, in _ureduce r = func(a, **kwargs) File "D:\Documenten\stable-diffusion-webui\venv\lib\site-packages\numpy\lib\function_base.py", line 4590, in _quantile_ureduce_func arr = a.flatten() numpy.core._exceptions._ArrayMemoryError: Unable to allocate 20.9 GiB for an array with shape (11246592000,) and data type float16

petermg commented 6 months ago

This indeed would be a great addition to the program! Sadly I am busy with other things and can't promise to add it anytime soon.

Seriously! I'm trying to do this as well. If you implemented the suggestions made by the OP, that would be insane. We could convert an entire feature length film to 3D with minimal interaction! As of right now I am outputting my video files to png files and even then it seems after 3300 I get an OOM error, which I find bizarre since I am expecting it to just process each frame individually, don't know what it's doing that will give it an OOM error but it seems unnecessary? I figured it would avoid the OOM error by batch processing image files?