Closed wcapes closed 6 months ago
I have received several requests for this. I am planning to support it.
However, the processing time for depth estimation is a small part of the overall processing time. Also, the disk IO and decoding process to load PNG images may be much slower than decoding video. I am not sure if this approach will save processing time.
The main slow down is the generation of the depth map and if we can cache/save/store that it can save HOURS when you taking the source image and loading the depth map to create the SBS final image.
I do believe this will be a HUGE bonus to this awesome software.
Thank you for considering it
My recommended workflow for testing or previewing is Max FPS=0.25. Ideally, processing time is reduced to 1/120 (for 30fps video). I think it is a much better approach than this request. A disadvantage is that seek does not work in some video players.
Would you consider then for the CLI tool? I had a look at the code and adding this support to the feature where it reads the images from a folder could be the starting point maybe?
--input
Then when looping through the source images, it could load from MAPS folder if it exists and then do the SBS image Guys could then test this out, and potentially if you wanted to do a movie, or episode, you could extract the frames - try different settings to get the best - and then just rebuild the video file ( This would be manual by the user usin FFMPEG )
Just thinking of ways to make it quicker for users.
If this works, could then investigate approaches for video. I would be happy to test
For now, I am planning to develop two CLI. export depth and generate video from depth,frame,audio.
Also here is a benchmark. I think depth estimation is much faster than you think. It is faster than saving and loading an image.
import os
import torch
import PIL
import torchvision.transforms.functional as TF
import time
ef bench_any():
model = torch.hub.load("nagadomi/Depth-Anything_iw3", "DepthAnything",
encoder="vitb", # Any_B
trust_repo=True).cuda()
RES = 392 # default resolution in iw3, 518 is also supported
BATCH = 4
N = 25
x = torch.rand((BATCH, 3, RES, RES)).cuda() - 0.5
with torch.inference_mode():
start = time.perf_counter()
for i in range(N):
with torch.autocast(device_type="cuda"):
depth = model(x)
torch.cuda.synchronize()
print(f"depth inference time per image: {round((time.perf_counter() - start) / (N * BATCH), 4)}s")
def bench_png():
OUTPUT_DIR = "png_bench"
FRAME_RES = (1920, 1080) # consider frame images
img = torch.rand((3, *FRAME_RES))
img = TF.to_pil_image(img)
os.makedirs(OUTPUT_DIR, exist_ok=True)
start = time.perf_counter()
N = 100
for i in range(N):
file_path = os.path.join(OUTPUT_DIR, f"{i}.png")
img.save(file_path)
PIL.Image.open(file_path).load()
print(f"PNG encode/save/load/decode time per image: {round((time.perf_counter() - start) / N, 4)}s")
if __name__ == '__main__':
bench_any()
bench_png()
result (Linux, RTX 3070ti)
depth inference time per image: 0.0148s
PNG encode/save/load/decode time per image: 0.2091s
result with FRAME_RES=(392,392)
depth inference time per image: 0.0148s
PNG encode/save/load/decode time per image: 0.0154s
Will check this out tomorrow (night here) thanks for the share :)
I chose to simply add --export option to the existing iw3.cli. It can be used in GUI. The input for video generation is a YAML config file. It will be generated when exporting. It also can be created manually. The config file can define, folder path for frame images, folder path for depth images, audio file path, FPS, and etc.
Sounds like a plan, can't wait to check it out :)
ps: Wouldn't nag for features if we didn't think this is GREAT piece of software :)
I added --export
and --export-disparity
option.
--export
and --export-disparity
are similar, but the difference is that --export-disparity
applies --edge-dlitation
and --mapper
(--foreground-scale
) to depth and skip them when generating video.
--export
is for experimenting with various video generation settings(this request), and --export-disparity
is for editing the depth images with external program.
CLI example for exporting,
python -m iw3.cli -i ./tmp/test_videos/filename.mp4 -o ./tmp/export --yes --depth-model ZoeD_Any_N --export --max-workers 8
python -m iw3.cli -i ./tmp/test_videos/images -o ./tmp/export/images --yes --depth-model ZoeD_Any_N --export --max-workers 8
In GUI, select Export
or Export disparity
for Stereo Format
.
--max-workers
(Worker Threads
) option has a big impact on export performance.
CLI example for generating video/image from exported data,
python -m iw3.cli -i ./tmp/export/filename/iw3_export.yml -o ./tmp/generated_video/ --yes
python -m iw3.cli -i ./tmp/export/images/iw3_export.yml -o ./tmp/generated_images/ --yes
Simply specify the yml file as input. When exporting with --export
option, the settings specified from CLI/GUI are used except video FPS.
output directory structure
tmp/export
└── filename
├── rgb
├── depth
├── audio.m4a
└── iw3_export.yml
rgb: frame images (8bit RGB PNG)
depth: depht images (16bit Grayscale PNG)
audio.m4a: audio file if exists
iw3_export.yml: config file for generating video
iw_export.yml
is YAML format. It can be edited with any text editor.
type: video
basename: filename
fps: 30.0
rgb_dir: rgb
depth_dir: depth
audio_file: audio.m4a
mapper: none
skip_mapper: false
skip_edge_dilation: false
updated_at: '2024-03-29T21:02:32.996779'
user_data:
export_options:
depth_model: Any_B
export_disparity: false
mapper: none
edge_dilation: 2
max_fps: 30
ema_normalize: false
type: "video" or "images". If "video", SBS video will be output. if "images", each file will be output as SBS image.
basename: output filename for video
fps: Video FPS.
rgb_dir, depth_dir: frame and depth images directory path. It can be given as absolute path or relative path from the yaml file.
audio_file: audio file path for video. If not specified or the file does not exist, it will be ignored.
mapper: mapper function used to convert from depth to disparity.
skip_mapper: If true, the mapper function will not be applied and the depth image will be used as disparity.
skip_edge_dilation: If true, --edge-dilation (Ege Fix) will not be applied.
updated_at: datetime in local time zone. Not used.
user_data: Not used. Currently, the options used for export are stored.
rgb and depth files will be used as sequential frames, in ascending order of file name, when type is "video". Edit: when type is "images" the rgb files will refer to the original images and no copy will be output. See https://github.com/nagadomi/nunif/discussions/173#discussioncomment-10036914
This is maybe a confusing point. The output depthmap is in different formats/scales depending on the depth estimation model. Stereo generation requires disparity. If the model output is not disparity, it should be converted to disparity. iw3 performs it by mapper function (--mapper option).
--mapper options used for depth model/--foreground-scale. | Model | --foreground-scale | --mapper |
---|---|---|---|
ZoeDepth | 0 | div_6 | |
ZoeDepth | 1 | div_4 | |
ZoeDepth | 2 | div_2 | |
ZoeDepth | 3 | div_1 | |
DepthAnything | 0 | none | |
DepthAnything | 1 | mul_1 | |
DepthAnything | 2 | mul_2 | |
DepthAnything | 3 | mul_3 |
If you use a program other than iw3 to output depthmaps, there may not be a proper conversion function. In that case, convert the images in the depth directory to disparity and specify "none".
EDIT: See https://github.com/nagadomi/nunif/blob/dev/iw3/docs/colorspace.md for Colorspace
I haven't tested this much, so if something strange happens, it's most likely a bug.
This is amazing, thank you - will start testing and playing this weekend
So far
audio.m4a extracts fine and contains the audio track.
All the images in RGB folder are fine, and are the video image as usual in PNG format
Export started on frame 2, not 1
I do see the Depth images ( in the Depth folder ) are all 782 byte files with PNG tag
and when viewed that are all black images
Not sure if this is as intended.
*** Will update as test process runs :)
link to https://github.com/nagadomi/nunif/discussions/87
Export started on frame 2, not 1
The number in the filename is frame PTS. No problem if filenames are in frame order.
Other problems do not happen in my env. Is the normal Full SBS video generation feature working? I am guessing that it is either the video you are using for testing or your python venv that is the cause.
Running part 2 after the first part command line used: python -m iw3.cli -i e:\dp1\dp1\iw3_export.yml -o e:\dp1\dp1\test.mp4 --yes --max-workers 8
If I use the GUI app it seems to work it's fine, I'm going to test your changes using the GUI and get back to you
Also I used "Installer for Windows.bat" and "Open Prompt.bat" to update and get command line access
Firstly I must say thank you for the continued assistance here :)
Using GUI on a different video file, a youtube video 1080p MP4, getting the same with the DEPTH folder where the images are legit PNG but empty and the RGB folder is spot on.
Can you think of anything I might have broken in my environment as I used the MASTER.ZIP download from here on a new setup? Not sure what else I can do to help you
Also I used "Installer for Windows.bat" and "Open Prompt.bat" to update and get command line access
On windows, I have only checked nunif-windows-package. https://github.com/nagadomi/nunif/blob/master/windows_package/docs/README.md I recommend using this.
if you don't want to use it, let me know your Python environment. Anaconda or official Python? Also try updating pip packages.
python -m pip install --no-cache-dir --upgrade pip
python -m pip install --no-cache-dir --upgrade -r requirements-torch.txt
python -m pip install --no-cache-dir --upgrade -r requirements.txt
python -m pip install --no-cache-dir --upgrade -r requirements-gui.txt
OverflowError
This is a ffmpeg problem that occurs with floating point FPS. What is the fps value in iw3_export.yml? Currently, only special FPS 29.97, 23.976, 59.94 are supported.
Seem to be good now :) Will continue testing
OverflowError: Python int too large to convert to C long
is now fixed in https://github.com/nagadomi/nunif/discussions/105
If you find a problem, post a new issue.
I'd like to save the DepthMap images used to another folder on the first run, so if I do another run I can use the pre-generated maps instead of re-generating for a 2nd/3rd run.
This way makes it easier and faster to test different options available