This is not an officially supported Google product.

DynIBaR: Neural Dynamic Image-Based Rendering

Project Page

Implementation for CVPR 2023 paper (best paper honorable mention)

DynIBaR: Neural Dynamic Image-Based Rendering, CVPR 2023

Zhengqi Li¹, Qianqian Wang^1,2, Forrester Cole¹, Richard Tucker¹, Noah Snavely¹

¹Google Research, ²Cornell Tech, Cornell University \

Instructions for installing dependencies

Python Environment

The following codebase was successfully run with Python 3.8 and CUDA 11.3. We suggest installing the library in a virtual environment such as Anaconda.

To install required libraries, run: \ conda env create -f enviornment_dynibar.yml

To install softmax splatting for preprocessing, clone and install the library from here.

To measure LPIPS, copy "models" folder from NSFF, and put it in the code root directory.

Evaluation on Nvidia Dynamic scene dataset.

Downloading data and pretrained checkpoint

We include pretrained checkpoints that can be accessed by running:

wget https://storage.googleapis.com/gresearch/dynibar/nvidia_checkpoints.zip
unzip nvidia_checkpoints.zip

put the unzipped "checkpoints" folder in the code root directory.

Each scene in the Nvidia dataset can be accessed here

The input data directory should similar to the following format: xxx/nvidia_long_release/Balloon1

Run the following command for each scene to obtain reported quantitative results:

  # Usage: In txt file, You need to change "rootdir" to your code root directory,
  # and "folder_path" to input data directory, and make sure "coarse_dir" points to
  # "checkpoints" folder you unzip.
  python eval_nvidia.py --config configs_nvidia/eval_balloon1_long.txt

Note: It will take ~8 hours to evaluate each scene with 4x Nvidia A100 GPUs.

Training/rendering on monocular videos.

Required inputs and corresponding folders or files:

We provide a template input data for the NSFF example video, which can be downloaded here

The input data directory should be in the following format: xxx/release/kid-running/dense/***

For your own video, you need to include the following folders to run training.

disp: disparity maps from dynamic-cvd. Note that you need to run test.py to save the disparity and camera parameters to the disk.
images_wxh: resized images at resolution w x h.

poses_bounds_cvd.npy: camera parameters of input video in LLFF format.

You can generate the above three items with the following script:

  # Usage: data_dir is input video directory path,
  # cvd_dir is saved depth directory resulting from running
  # "test.py" at https://github.com/google/dynamic-video-depth
  python save_monocular_cameras.py \
  --data_dir xxx/release/kid-running \
  --cvd_dir xxx/kid-running_scene_flow_motion_field_epoch_20/epoch0020_test

source_virtual_views_wxh: virtual source views used to improve training stability and rendering quality (used in monocular video only). Running the following script to obtain them:

# Usage: data_dir is input video directory path,
# cvd_dir is saved depth direcotry resulting from running
# "test.py" at https://github.com/google/dynamic-video-depth
python render_source_vv.py \
 --data_dir xxx/release/kid-running \
 --cvd_dir xxx/kid-running_scene_flow_motion_field_epoch_20/epoch0020_test

flow_i1, flow_i2, flow_i3: estimated optical flows within temporal window of length 3. You can follow prior NSFF script to run optical flows between the frame i and its nearby frames i+1, i+2, i+3, and save them in folders "flow_i1", "flow_i2", "flow_i3" respectively. For example, 00000_fwd.npz in folder "flow_i1" stores forward flow and valid mask from frame 0 to frame 1, and 00000_bwd.npz stores backward flow and valid mask from frame 1 to frame 0.
static_masks, dynamic_masks: motion masks indicating which region is stationary or moving. You can perform morphological dilation and erosion operations respectively to ensure static_masks sufficeintly cover the regions of moving objects, and the regions from dynamic_masks are within the true regions of moving objects. (Note: due to dependency reason, we don't release code to generate the masks. Instead you could use script from NSFF to generate coarse masks for your usage)

To train the model:

  # Usage: config is config txt file for training video
  # make sure "rootdir" is your code root directory,
  # "folder_path" is your input data directory path,
  # "train_scenes" is your folder name.
  # For example, if data is in xxx/release/kid-running/dense/, then "train_scenes" is 
  # "xxx/release/", "train_scenes" is "kid-running"
  python train.py \
  --config configs/train_kid-running.txt

Hyperparameters in config txt file you might need to know for training a good model on in-the-wild videos

rootdir: code root directory, should be in format: YOUR_PATH/dynibar
folder_path: data root directory,
N_rand: number of random samples at each iterations. Try to set it as large as possible, typically > 3000 gives good results
init_decay_epoch: number of epochs to linaerly decay the data-driven depth and optical flow losses. Modify this such that num_video_frames * init_decay_epoch = 30~40K
max_range, num_source_views: max_range indicates maximum search frame ranges to select source views for static model. num_source_views*2 is number of source views used for static model.

The tensorboard includes rendering visualization as shown below.

To render the model:

  # Usage: config is config txt file for training video,
  # please make sure expname in txt is the saved folder name in 'out' directory
  python render_monocular_bt.py \
  --config configs/test_kid-running.txt

Contact

For any questions related to our paper and implementation, please send email to zhengqili@google.com.

Citation

@InProceedings{Li_2023_CVPR,
    author    = {Li, Zhengqi and Wang, Qianqian and Cole, Forrester and Tucker, Richard and Snavely, Noah},
    title     = {DynIBaR: Neural Dynamic Image-Based Rendering},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {4273-4284}
}

google / dynibar

readme