This repository contains training code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".
This is not an officially supported Google product.
We provide both conda and pip installations for dependencies.
conda create --name dynamic-video-depth --file ./dependencies/conda_packages.txt
pip install -r ./dependencies/requirements.txt
We provide two preprocessed video tracks from the DAVIS dataset. To download the pre-trained single-image depth prediction checkpoints, as well as the example data, run:
bash ./scripts/download_data_and_depth_ckpt.sh
This script will automatically download and unzip the checkpoints and data. To download mannually, use this link.
To train using the example data, run:
bash ./experiments/davis/train_sequence.sh 0 --track_id dog
The first argument indicates the GPU id for training, and --track_id
indicates the name of the track. ('dog' and 'train' are provided.)
After training, the results should look like:
Video | Our Depth | Single Image Depth |
---|---|---|
To help with generating custom datasets for training, We provide examples of preparing the dataset from DAVIS, and two sequences from ShutterStock, which are showcased in our paper.
The general work flow for preprocessing the dataset is:
Calibrate the scale of camera translation, transform the camera matrices into camera-to-world convention, and save as individual files.
Calculate flow between pairs of frames, as well as occlusion estimates.
Pack flow and per-frame data into training batches.
To be more specific, example codes are provided in .scripts/preprocess
We provide the triangulation results here and here. You can download them in a single script by running:
bash ./scripts/download_triangulation_files.sh
Download the DAVIS dataset here, and unzip it under ./datafiles
.
Run python ./scripts/preprocess/davis/generate_frame_midas.py
. This requires trimesh
to be installed (pip install trimesh
should do the trick). This script projects the triangulated 3D points to calibrate camera translation scales.
Run python ./scripts/preprocess/davis/generate_flows.py
to generate optical flows between pairs of images. This stage requires RAFT
, which is included as a submodule in this repo.
Run python ./scripts/preprocess/davis/generate_sequence_midas.py
to pack camera calibrations and images into training batches.
Cast the videos as images, put them under ./datafiles/shutterstock/images
, and rename them to match the file names in ./datafiles/shutterstock/triangulation
. Note that not all frames are triangulated; time stamp of valid frames are recorded in the triangulation file name.
Run python ./scripts/preprocess/shutterstock/generate_frame_midas.py
to pack per-frame data.
Run python ./scripts/preprocess/shutterstock/generate_flows.py
to generate optical flows between pairs of images.
Run python ./scripts/preprocess/shutterstock/generate_sequence_midas.py
to pack flows and per-frame data into training batches.
Example training script is located at ./experiments/shutterstock/train_sequence.sh