[Request] Add ability to save generated maps to a folder and reuse for a second run

wcapes commented 7 months ago

I'd like to save the DepthMap images used to another folder on the first run, so if I do another run I can use the pre-generated maps instead of re-generating for a 2nd/3rd run.

This way makes it easier and faster to test different options available

nagadomi commented 7 months ago

I have received several requests for this. I am planning to support it.

However, the processing time for depth estimation is a small part of the overall processing time. Also, the disk IO and decoding process to load PNG images may be much slower than decoding video. I am not sure if this approach will save processing time.

wcapes commented 7 months ago

The main slow down is the generation of the depth map and if we can cache/save/store that it can save HOURS when you taking the source image and loading the depth map to create the SBS final image.

I do believe this will be a HUGE bonus to this awesome software.

Thank you for considering it

nagadomi commented 7 months ago

My recommended workflow for testing or previewing is Max FPS=0.25. Ideally, processing time is reduced to 1/120 (for 30fps video). I think it is a much better approach than this request. A disadvantage is that seek does not work in some video players.

wcapes commented 7 months ago

Would you consider then for the CLI tool? I had a look at the code and adding this support to the feature where it reads the images from a folder could be the starting point maybe?

--input --output --maps OPTIONAL

Then when looping through the source images, it could load from MAPS folder if it exists and then do the SBS image Guys could then test this out, and potentially if you wanted to do a movie, or episode, you could extract the frames - try different settings to get the best - and then just rebuild the video file ( This would be manual by the user usin FFMPEG )

Just thinking of ways to make it quicker for users.

If this works, could then investigate approaches for video. I would be happy to test

nagadomi commented 7 months ago

For now, I am planning to develop two CLI. export depth and generate video from depth,frame,audio.

Also here is a benchmark. I think depth estimation is much faster than you think. It is faster than saving and loading an image.

import os
import torch
import PIL
import torchvision.transforms.functional as TF
import time

ef bench_any():                                                                                           
   model = torch.hub.load("nagadomi/Depth-Anything_iw3", "DepthAnything",
                          encoder="vitb", # Any_B                                                         
                          trust_repo=True).cuda()
   RES = 392 # default resolution in iw3, 518 is also supported                                           
   BATCH = 4
   N = 25
   x = torch.rand((BATCH, 3, RES, RES)).cuda() - 0.5
   with torch.inference_mode():
       start = time.perf_counter()
       for i in range(N):
           with torch.autocast(device_type="cuda"):
               depth = model(x)
       torch.cuda.synchronize()
       print(f"depth inference time per image: {round((time.perf_counter() - start) / (N * BATCH), 4)}s")

def bench_png():
    OUTPUT_DIR = "png_bench"
    FRAME_RES = (1920, 1080)  # consider frame images
    img = torch.rand((3, *FRAME_RES))
    img = TF.to_pil_image(img)
    os.makedirs(OUTPUT_DIR, exist_ok=True)

    start = time.perf_counter()
    N = 100
    for i in range(N):
        file_path = os.path.join(OUTPUT_DIR, f"{i}.png")
        img.save(file_path)
        PIL.Image.open(file_path).load()
    print(f"PNG encode/save/load/decode time per image: {round((time.perf_counter() - start) / N, 4)}s")

if __name__ == '__main__':
    bench_any()
    bench_png()

result (Linux, RTX 3070ti)

depth inference time per image: 0.0148s
PNG encode/save/load/decode time per image: 0.2091s

result with FRAME_RES=(392,392)

depth inference time per image: 0.0148s
PNG encode/save/load/decode time per image: 0.0154s

wcapes commented 7 months ago

Will check this out tomorrow (night here) thanks for the share :)

nagadomi commented 7 months ago

I chose to simply add --export option to the existing iw3.cli. It can be used in GUI. The input for video generation is a YAML config file. It will be generated when exporting. It also can be created manually. The config file can define, folder path for frame images, folder path for depth images, audio file path, FPS, and etc.

wcapes commented 7 months ago

Sounds like a plan, can't wait to check it out :)

ps: Wouldn't nag for features if we didn't think this is GREAT piece of software :)

nagadomi commented 7 months ago

I added --export and --export-disparity option.

--export and --export-disparity are similar, but the difference is that --export-disparity applies --edge-dlitation and --mapper(--foreground-scale) to depth and skip them when generating video. --export is for experimenting with various video generation settings(this request), and --export-disparity is for editing the depth images with external program.

CLI example for exporting,

python -m iw3.cli -i ./tmp/test_videos/filename.mp4 -o ./tmp/export --yes --depth-model ZoeD_Any_N --export --max-workers 8
python -m iw3.cli -i ./tmp/test_videos/images -o ./tmp/export/images --yes --depth-model ZoeD_Any_N --export --max-workers 8

In GUI, select Export or Export disparity for Stereo Format.

--max-workers(Worker Threads) option has a big impact on export performance.

CLI example for generating video/image from exported data,

python -m iw3.cli -i ./tmp/export/filename/iw3_export.yml -o ./tmp/generated_video/ --yes
python -m iw3.cli -i ./tmp/export/images/iw3_export.yml -o ./tmp/generated_images/ --yes

Simply specify the yml file as input. When exporting with --export option, the settings specified from CLI/GUI are used except video FPS.

Format

output directory structure

tmp/export
└── filename
    ├── rgb
    ├── depth
    ├── audio.m4a
    └── iw3_export.yml

rgb: frame images (8bit RGB PNG)
depth: depht images (16bit Grayscale PNG)
audio.m4a: audio file if exists
iw3_export.yml: config file for generating video

iw_export.yml is YAML format. It can be edited with any text editor.

type: video
basename: filename
fps: 30.0
rgb_dir: rgb
depth_dir: depth
audio_file: audio.m4a
mapper: none
skip_mapper: false
skip_edge_dilation: false
updated_at: '2024-03-29T21:02:32.996779'
user_data:
  export_options:
    depth_model: Any_B
    export_disparity: false
    mapper: none
    edge_dilation: 2
    max_fps: 30
    ema_normalize: false

type: "video" or "images". If "video", SBS video will be output. if "images", each file will be output as SBS image.
basename: output filename for video
fps: Video FPS.
rgb_dir, depth_dir: frame and depth images directory path. It can be given as absolute path or relative path from the yaml file.
audio_file: audio file path for video. If not specified or the file does not exist, it will be ignored.
mapper: mapper function used to convert from depth to disparity.
skip_mapper: If true, the mapper function will not be applied and the depth image will be used as disparity.
skip_edge_dilation: If true, --edge-dilation (Ege Fix) will not be applied.
updated_at: datetime in local time zone. Not used.
user_data: Not used. Currently, the options used for export are stored.

rgb and depth files will be used as sequential frames, in ascending order of file name, when type is "video". Edit: when type is "images" the rgb files will refer to the original images and no copy will be output. See https://github.com/nagadomi/nunif/discussions/173#discussioncomment-10036914

depth, disparity and mapper function

This is maybe a confusing point. The output depthmap is in different formats/scales depending on the depth estimation model. Stereo generation requires disparity. If the model output is not disparity, it should be converted to disparity. iw3 performs it by mapper function (--mapper option).

--mapper options used for depth model/--foreground-scale.	Model	--foreground-scale
ZoeDepth	0	div_6
ZoeDepth	1	div_4
ZoeDepth	2	div_2
ZoeDepth	3	div_1
DepthAnything	0	none
DepthAnything	1	mul_1
DepthAnything	2	mul_2
DepthAnything	3	mul_3

If you use a program other than iw3 to output depthmaps, there may not be a proper conversion function. In that case, convert the images in the depth directory to disparity and specify "none".

EDIT: See https://github.com/nagadomi/nunif/blob/dev/iw3/docs/colorspace.md for Colorspace

nagadomi commented 7 months ago

I haven't tested this much, so if something strange happens, it's most likely a bug.

wcapes commented 7 months ago

This is amazing, thank you - will start testing and playing this weekend

So far

audio.m4a extracts fine and contains the audio track.
All the images in RGB folder are fine, and are the video image as usual in PNG format
Export started on frame 2, not 1
I do see the Depth images ( in the Depth folder ) are all 782 byte files with PNG tag

and when viewed that are all black images

Not sure if this is as intended.

Seems to be an issue with the time? Of course this isn't important :) just mentioning :) Just noticed max frames, that's just weird - LOL

*** Will update as test process runs :)

nagadomi commented 7 months ago

link to https://github.com/nagadomi/nunif/discussions/87

Export started on frame 2, not 1

The number in the filename is frame PTS. No problem if filenames are in frame order.

Other problems do not happen in my env. Is the normal Full SBS video generation feature working? I am guessing that it is either the video you are using for testing or your python venv that is the cause.

wcapes commented 7 months ago

Running part 2 after the first part command line used: python -m iw3.cli -i e:\dp1\dp1\iw3_export.yml -o e:\dp1\dp1\test.mp4 --yes --max-workers 8

If I use the GUI app it seems to work it's fine, I'm going to test your changes using the GUI and get back to you

Also I used "Installer for Windows.bat" and "Open Prompt.bat" to update and get command line access

wcapes commented 7 months ago

Firstly I must say thank you for the continued assistance here :)

Using GUI on a different video file, a youtube video 1080p MP4, getting the same with the DEPTH folder where the images are legit PNG but empty and the RGB folder is spot on.

Can you think of anything I might have broken in my environment as I used the MASTER.ZIP download from here on a new setup? Not sure what else I can do to help you

nagadomi commented 7 months ago

Also I used "Installer for Windows.bat" and "Open Prompt.bat" to update and get command line access

On windows, I have only checked nunif-windows-package. https://github.com/nagadomi/nunif/blob/master/windows_package/docs/README.md I recommend using this.

if you don't want to use it, let me know your Python environment. Anaconda or official Python? Also try updating pip packages.

python -m pip install --no-cache-dir --upgrade pip
python -m pip install --no-cache-dir --upgrade -r requirements-torch.txt
python -m pip install --no-cache-dir --upgrade -r requirements.txt
python -m pip install --no-cache-dir --upgrade -r requirements-gui.txt

OverflowError

This is a ffmpeg problem that occurs with floating point FPS. What is the fps value in iw3_export.yml? Currently, only special FPS 29.97, 23.976, 59.94 are supported.

wcapes commented 7 months ago

Seem to be good now :) Will continue testing

nagadomi commented 7 months ago

OverflowError: Python int too large to convert to C long is now fixed in https://github.com/nagadomi/nunif/discussions/105

nagadomi commented 6 months ago

If you find a problem, post a new issue.

nagadomi / nunif

[Request] Add ability to save generated maps to a folder and reuse for a second run #97

Format

depth, disparity and mapper function