malteprinzler / diner

Official PyTorch Implementation of the CVPR23 Paper "DINER: Depth-aware Image-based NEural Radiance fields"
79 stars 8 forks source link

DINER: Depth-aware Image-based NEural Radiance fields
Official PyTorch implementation of the CVPR 2023 paper (Project Page, Video)

Malte Prinzler, Otmar Hilliges, Justus Thies



Teaser image



Abstract:
We present Depth-aware Image-based NEural Radiance fields (DINER). Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views. Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. In comparison to the previous state of the art, DINER achieves higher synthesis quality and can process input views with greater disparity. This allows us to capture scenes more completely without changing capturing hardware requirements and ultimately enables larger viewpoint changes during novel view synthesis. We evaluate our method by synthesizing novel views, both for human heads and for general objects, and observe significantly improved qualitative results and increased perceptual metrics compared to the previous state of the art.



Download Code & Install Python Environment

Download the code via

git clone https://github.com/malteprinzler/diner.git
cd diner

DINER was developed and tested with Python3.8, PyTorch 1.11.0, CUDA 11.3. We recommend installing a virtual python environment by running the following commands:

python3.8 -m venv venv
source venv/bin/activate
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt

Downloadable Assets

This repository is accompanied by pretrained model weights and dataset split configurations. Please download the zipped files from here and extract it into the project root. The final directory tree should look like:

diner (repository root)
|- assets
|   |- ckpts
|   |- data_splits
| ...

Quickstart: Evaluate Pretrained DINER on DTU

1) Dataset Download

2) Evaluate Pretrained DINER on DTU


Evaluate DINER on Facescape

1) Download & Preprocess the Facescape Dataset

2) Writing TransMVSNet Depth Predictions to Facescape

3) Evaluate Pretrained DINER on Facescape


Evaluate DINER (trained on Facescape) on MultiFace

1) Downloading & Preprocessing the MULTIFACE Dataset

2) Writing TransMVSNet Depth Predictions to MULTIFACE

3) Evaluate Pretrained DINER (trained on Facescape) on MULTIFACE


Train TransMVSNet (Depth Estimator) from scratch

To train the depth estimator from scratch, run

deps/TransMVSNet/scripts/train_TransMVSNet_dtu.sh  # Training on DTU

or

deps/TransMVSNet/scripts/train_TransMVSNet_facescape.sh  # Training on Facescape

. To change the training settting, please adjust the respective *.sh files. Note that the authors of TransMVSNet recommend training with 8 GPUs.


Train DINER from scratch

To train DINER from scratch, run

python python_scripts/train.py configs/train_dtu.yaml  # Training on DTU

or

python python_scripts/train.py configs/train_facescape.yaml  # Training on Facescape

. Note that we use one NVIDIA A100-SXM4-80GB for training.


License

The code is available for non-commercial scientific research purposes under the CC BY-NC 3.0 license.

Citation

If you find our work useful, please include the following citation:

@inproceedings{prinzler2023diner,
  title={DINER: (D)epth-aware (I)mage-based (Ne)ural (R)adiance Fields},
  author={Prinzler, Malte and Hilliges, Otmar and Thies, Justus},    
  booktitle = {Computer Vision and Pattern Recognition (CVPR)},
  year = {2023}
}

Parts of our code are heavily inspired by https://github.com/sxyu/pixel-nerf and https://github.com/megvii-research/TransMVSNet so please consider citing their work as well.

Acknowledgements

Malte Prinzler was supported by the Max Planck ETH Center for Learning Systems (CLS) during this project.