google / nerfies

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.
https://nerfies.github.io
Apache License 2.0
1.59k stars 215 forks source link
3d machine-learning nerf neural-network neural-rendering

Nerfies: Deformable Neural Radiance Fields

This is the code for Nerfies: Deformable Neural Radiance Fields.

This codebase is implemented using JAX, building on JaxNeRF.

This repository has been updated to reflect the version used for our ICCV 2021 submission.

Demo

We provide an easy-to-get-started demo using Google Colab!

These Colabs will allow you to train a basic version of our method using Cloud TPUs (or GPUs) on Google Colab.

Note that due to limited compute resources available, these are not the fully featured models. If you would like to train a fully featured Nerfie, please refer to the instructions below on how to train on your own machine.

Description Link
Process a video into a Nerfie dataset Open In Colab
Train a Nerfie Open In Colab
Render a Nerfie video Open In Colab

Setup

The code can be run under any environment with Python 3.8 and above. (It may run with lower versions, but we have not tested it).

We recommend using Miniconda and setting up an environment:

conda create --name nerfies python=3.8

Next, install the required packages:

pip install -r requirements.txt

Install the appropriate JAX distribution for your environment by following the instructions here. For example:

# For CUDA version 11.0
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Training

After preparing a dataset, you can train a Nerfie by running:

export DATASET_PATH=/path/to/dataset
export EXPERIMENT_PATH=/path/to/save/experiment/to
python train.py \
    --data_dir $DATASET_PATH \
    --base_folder $EXPERIMENT_PATH \
    --gin_configs configs/test_vrig.gin

To plot telemetry to Tensorboard and render checkpoints on the fly, also launch an evaluation job by running:

python eval.py \
    --data_dir $DATASET_PATH \
    --base_folder $EXPERIMENT_PATH \
    --gin_configs configs/test_vrig.gin

The two jobs should use a mutually exclusive set of GPUs. This division allows the training job to run without having to stop for evaluation.

Configuration

Datasets

A dataset is a directory with the following structure:

dataset
    ├── camera
    │   └── ${item_id}.json
    ├── camera-paths
    ├── rgb
    │   ├── ${scale}x
    │   └── └── ${item_id}.png
    ├── metadata.json
    ├── points.npy
    ├── dataset.json
    └── scene.json

At a high level, a dataset is simply the following:

We have a unique identifier for each image which we call item_id, and this is used to match the camera and images. An item_id can be any string, but typically it is some alphanumeric string such as 000054.

camera

{
  // A 3x3 world-to-camera rotation matrix representing the camera orientation.
  "orientation": [
    [0.9839, -0.0968, 0.1499],
    [-0.0350, -0.9284, -0.3699],
    [0.1749, 0.358, -0.9168]
  ],
  // The 3D position of the camera in world-space.
  "position": [-0.3236, -3.26428, 5.4160],
  // The focal length of the camera.
  "focal_length": 2691,
  // The principle point [u_0, v_0] of the camera.
  "principal_point": [1220, 1652],
  // The skew of the camera.
  "skew": 0.0,
  // The aspect ratio for the camera pixels.
  "pixel_aspect_ratio": 1.0,
  // Parameters for the radial distortion of the camera.
  "radial_distortion": [0.1004, -0.2090, 0.0],
  // Parameters for the tangential distortion of the camera.
  "tangential": [0.001109, -2.5733e-05],
  // The image width and height in pixels.
  "image_size": [2448, 3264]
}

camera-paths

rgb

metadata.json

{
    "${item_id}": {
        // The embedding ID used to fetch the deformation latent code
        // passed to the deformation field.
        "warp_id": 0,
        // The embedding ID used to fetch the appearance latent code
        // which is passed to the second branch of the template NeRF.
        "appearance_id": 0,
        // For validation rig datasets, we use the camera ID instead
        // of the appearance ID. For example, this would be '0' for the
        // left camera and '1' for the right camera. This can potentially
        // also be used for multi-view setups as well.
        "camera_id": 0
    },
    ...
},

scene.json

{
  // The scale factor we will apply to the pointcloud and cameras. This is
  // important since it controls what scale is used when computing the positional
  // encoding.
  "scale": 0.0387243672920458,
  // Defines the origin of the scene. The scene will be translated such that
  // this point becomes the origin. Defined in unscaled coordinates.
  "center": [
    1.1770838526103944e-08,
    -2.58235339289195,
    -1.29117656263135
  ],
  // The distance of the near plane from the camera center in scaled coordinates.
  "near": 0.02057418950149491,
  // The distance of the far plane from the camera center in scaled coordinates.
  "far": 0.8261601717667288
}

dataset.json

{
  // The total number of images in the dataset.
  "count": 114,
  // The total number of training images (exemplars) in the dataset.
  "num_exemplars": 57,
  // A list containins all item IDs in the dataset.
  "ids": [...],
  // A list containing all training item IDs in the dataset.
  "train_ids": [...],
  // A list containing all validation item IDs in the dataset.
  // This should be mutually exclusive with `train_ids`.
  "val_ids": [...],
}

points.npy

Citing

If you find our work useful, please consider citing:

@article{park2021nerfies
  author    = {Park, Keunhong 
               and Sinha, Utkarsh 
               and Barron, Jonathan T. 
               and Bouaziz, Sofien 
               and Goldman, Dan B 
               and Seitz, Steven M. 
               and Martin-Brualla, Ricardo},
  title     = {Nerfies: Deformable Neural Radiance Fields},
  journal   = {ICCV},
  year      = {2021},
}