nagadomi / nunif

Misc; latest version of waifu2x; 2D video to stereo 3D video conversion
MIT License
1.42k stars 134 forks source link

Equivalent settings to Owl3D's popout feature #181

Closed NoUserNameForYou closed 2 months ago

NoUserNameForYou commented 2 months ago

Hello, I'm really loving your app, great work! Especially on the depth generation part!

Your app's depth maps are higher quality than Owl3D free's. However, its popout feature gives out tremendous results compared to yours. I tried a bunch of combinations but I can't get a similar result. I used Any depth v2 L because it gives better depth than Zoe in my test clip.

What would the equivalent settings to Owl3D 1.4.8's (can be obtained from their Discord) depth:9, popout:10, convergence:0, AI model:precision, depth smoothing:medium settings?

Thank you again!

See what I'm talking about. demo.zip

nagadomi commented 2 months ago

Based on the automatic parameter search, 3D Strength=7, Convergence Plane=0, Your Own size=0, Method=forward_fill, Foreground Scale=0 is the most similar result by iw3. (3D Strength 7 is not in the combox, but you can enter the values directly from your keyboard)

However, forward_fill is not a practical method and row_flow_v3 currently only supports up to 3D Strength=5.

NoUserNameForYou commented 2 months ago

Thank you. Closest I was able to get with your app are these files.

demo2.zip

There are artifacts which need edge fix, which would reduce the depth. Something from the equation is missing, perhaps in the future you can add different sliders like the Owl3D does.

Thanks for your quick answer.

One more question though, Zoe NK model uses around 12.7GB vmem on my 12GB machine and crawls to 0.07 fps, other models are fine. I tried the low vmem and batch sizes options but couldn't reduce it below 12GB.

Any possiblity to do so?

nagadomi commented 2 months ago

In my environment(Linux RTX3070Ti), there is no major difference of VRAM usage between ZoeD_N and ZoeD_NK.

ZoeD_NK

Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_NK.pt
Loaded successfully
2024-07-21 02:55:57,502:nunif: [   DEBUG] create_model: sbs.row_flow_v3({}), device_ids=[0]
2024-07-21 02:55:57,506:nunif: [   DEBUG] load: sbs.row_flow_v3 from https://github.com/nagadomi/nunif/releases/download/0.0.0/iw3_row_flow_v3_20240423.pth
720x720.mp4: 100%|███████████████████| 150/150 [00:14<00:00, 10.70it/s]
2024-07-21 02:56:12,442:nunif: [   DEBUG] GPU Max Memory Allocated 3665MB

ZoeD_N

Using pretrained resource url::https://github.com/isl-org/ZoeDepth/releases/download/v1.0/ZoeD_M12_N.pt
Loaded successfully
2024-07-21 02:56:24,565:nunif: [   DEBUG] create_model: sbs.row_flow_v3({}), device_ids=[0]
2024-07-21 02:56:24,570:nunif: [   DEBUG] load: sbs.row_flow_v3 from https://github.com/nagadomi/nunif/releases/download/0.0.0/iw3_row_flow_v3_20240423.pth
720x720.mp4: 100%|███████████████████| 150/150 [00:14<00:00, 10.40it/s]
2024-07-21 02:56:39,921:nunif: [   DEBUG] GPU Max Memory Allocated 3667MB

If the height and width of the input image are extremely different, VRAM usage may increase, but that should be the same for ZoeD_N. I don't have a Windows machine with GPU now, so I'll give it a try next time I test on Windows.

NoUserNameForYou commented 2 months ago

Sorry, I forgot to mention that the depth size was at 512 and the video was 1920x800, output was Full SBS. And yes, I'm on Windows, thank you for your time.

nagadomi commented 2 months ago

depth size was at 512, 1920x800

I think the input size is too large for ZoeDepth. With this setting, the input size of the model is 1248x512. That is 4.3x larger than 384x384 (Depth Resolution=Default and 1:1 aspect ratio).

ZoeDepth (MiDaS backbone) requires more VRAM than DepthAnything (DINOv2 backbone). Any_B used only 1GB VRAM even for this input size. ZoeD_Any_N also used only 2.3GB VRAM, because the backbone of ZoeD_Any_N and ZoeD_Any_K is DINOv2. ZoeD_N, ZoeD_K, and ZoeD_NK have higher VRAM usage than the other models.