thygate / stable-diffusion-webui-depthmap-script

High Resolution Depth Maps for Stable Diffusion WebUI
MIT License
1.65k stars 153 forks source link

Won't generate Depthmap video with 'dpt_beit_large_512' #410

Open eyeEmotion opened 4 months ago

eyeEmotion commented 4 months ago

Hi,

I'm converting a test-video with several models, with and without Boost on. Most of the models work, although some I couldn't test out because it took too long. But when I select DPT_BEIT_LARGE_512 (still have to test if the 384 has the same problem), I first get some warnings, then it keeps on creating depthmaps per frame, When that's done, it start generating the the output, but fails at it.

Edit: Tried the 384 and that one works fine. Tried the 512 again with a different video-file, that was also bigger/longer. But again, at 13%, got the same warnings.

I'm using the deptmap within Stable Diffusion (Automatic1111 or what is it called) Here is what is outputted in the commandline:


To create a public link, set share=True in launch(). Startup time: 28.1s (prepare environment: 9.4s, import torch: 5.2s, import gradio: 3.4s, setup paths: 3.2s, initialize shared: 0.4s, other imports: 2.2s, setup codeformer: 0.3s, load scripts: 3.6s, create ui: 0.2s, gradio launch: 0.4s). Creating model from config: D:\Documenten\stable-diffusion-webui\configs\v1-inference.yaml Applying attention optimization: Doggettx... done. Model loaded in 47.1s (load weights from disk: 36.7s, create model: 0.2s, apply weights to model: 1.2s, apply half(): 1.9s, load textual inversion embeddings: 0.2s, calculate empty prompt: 6.8s). Generating depthmaps for the video frames DepthMap v0.4.6 (500ee72a) device: cuda Loading model(s) .. Loading model weights from ./models/midas/dpt_beit_large_512.pt Computing output(s) .. 13%|█████████▉ | 222/1757 [03:24<23:09, 1.10it/s]WARNING:py.warnings:D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in subtract out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in divide out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

100%|██████████████████████████████████████████████████████████████████████████████| 1757/1757 [26:14<00:00, 1.12it/s] Computing output(s) done. All done.

Processing generated depthmaps Generating output frames DepthMap v0.4.6 (500ee72a) device: cuda Computing output(s) .. 99%|█████████████████████████████████████████████████████████████████████████████ | 1737/1757 [00:23<00:00, 73.84it/s] Fail.

Traceback (most recent call last): File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate ret = video_mode.gen_video( File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp)) File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel raise e File "D:\Documenten\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel if inputdepthmaps is not None and inputdepthmaps[count] is not None: IndexError: list index out of range


My hardware is up to the snuff, cause I even can use BOOST without my computer breaking a sweat (it just takes a long time). Got an i7-13700K, 2x16GB 3600 DDR4 RAM with an Nvidia RTX 3060 OC 12GB. (Btw, is there a setting where I can dedicate more VRAM to it. It seems to mostly run around 5GB RAM, sometimes 7GB RAM.

Also, how can I add other models to the dropdown list? For example, I also want to try 'dpt_swin_large_384.pt' and 'dpt_swin2_large_384.pt'.

eyeEmotion commented 3 months ago

Manually cut and rendered my video file again. This time the Depth generation was able to get through the first process all the way through. But now it gave an error with the "Generating output frames" section. It was almost at the last frames, when I suddenly get this error:


Startup time: 8.1s (prepare environment: 1.9s, import torch: 2.2s, import gradio: 0.8s, setup paths: 0.7s, initialize shared: 0.2s, other imports: 0.4s, load scripts: 1.4s, create ui: 0.2s, gradio launch: 0.2s). Creating model from config: C:\AI\stable-diffusion-webui\configs\v1-inference.yaml Applying attention optimization: Doggettx... done. Model loaded in 6.2s (load weights from disk: 1.9s, create model: 0.2s, apply weights to model: 1.2s, apply half(): 1.3s, calculate empty prompt: 1.5s). Generating depthmaps for the video frames DepthMap v0.4.6 (500ee72a) device: cuda Loading model(s) .. Loading model weights from ./models/midas/dpt_beit_large_512.pt Computing output(s) .. 100%|████████████████████████████████████████████████████████████████████████████| 4320/4320 [1:06:43<00:00, 1.08it/s] Computing output(s) done. All done.

Processing generated depthmaps Generating output frames DepthMap v0.4.6 (500ee72a) device: cuda Computing output(s) .. 99%|█████████████████████████████████████████████████████████████████████████████▎| 4283/4320 [04:36<00:02, 15.47it/s] Fail.

Traceback (most recent call last): File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate ret = video_mode.gen_video( File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp)) File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel raise e File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel if inputdepthmaps is not None and inputdepthmaps[count] is not None: IndexError: list index out of range


It was a 3 minute video. Nothing seemed to go wrong at any hardware level, when watching in the Task Manager. Didn't have any trouble with space on my RAM or Virtual Memory.

eyeEmotion commented 3 months ago

Tried it again today, this time with a 2 minute and 30 seconds video file. Now I got this error again:


DepthMap v0.4.6 (500ee72a) device: cuda Loading model(s) .. Loading model weights from ./models/midas/dpt_beit_large_512.pt Computing output(s) .. 30%|███████████████████████▍ | 1082/3600 [16:09<36:27, 1.15it/s]WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in subtract out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:196: RuntimeWarning: invalid value encountered in divide out = (out - out.min()) / (out.max() - out.min()) # normalize to [0; 1]

WARNING:py.warnings:C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py:45: RuntimeWarning: invalid value encountered in cast return out.astype("uint16")

100%|██████████████████████████████████████████████████████████████████████████████| 3600/3600 [53:35<00:00, 1.12it/s] Computing output(s) done. All done.

Processing generated depthmaps Generating output frames DepthMap v0.4.6 (500ee72a) device: cuda Computing output(s) .. 99%|█████████████████████████████████████████████████████████████████████████████▎| 3569/3600 [03:25<00:01, 17.33it/s] Fail.

Traceback (most recent call last): File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\common_ui.py", line 457, in run_generate ret = video_mode.gen_video( File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\video_mode.py", line 159, in gen_video img_results = list(core.core_generation_funnel(None, input_images, input_depths, None, inp)) File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 322, in core_generation_funnel raise e File "C:\AI\stable-diffusion-webui\extensions\stable-diffusion-webui-depthmap-script\src\core.py", line 139, in core_generation_funnel if inputdepthmaps is not None and inputdepthmaps[count] is not None: IndexError: list index out of range


It's seems to always go wrong somewhere, when usig 'dpt_beit_large_512'.

semjon00 commented 3 months ago

This issue makes my heart bleed... And I probably won't have time in the near future to fix this. Double downer. You mention in the other issue that you managed to kinda make it work, but the experience was nowhere near seamless, whereas it would be nice if it was.

Btw, you could try the Depth Anything model - it works great and does not require BOOST (it is better to disable BOOST because it is resource and VRAM hungry).

eyeEmotion commented 3 months ago

Hi,

I'm indeed currently using the Depth Anything model, after someone at forum for 2d-to-3d movie conversion suggested it. I didn't use that one at first, because that one didn't work with me either. But then I discovered, I also had to install Controlnet and add arguments to the launch .bat in order for it to work.

It's just slightly slower than Midas V3.1 BEIT_L_384, but it is so much more accurate, the right sort of details and a lot more stable. Downside is, I had to divide the movie into even smaller parts for it to work: 2 minute clips instead of 3 minutes. Midas V3.1 BEIT_L_384 takes around 30 minutes for a 3 minute clip, while Depth Anything takes around 40 minutes for a 2 minute clip.

Does Midas V3.1 BEIT_L_512 differ much from BEIT_L_384?

Cheers