Malte Prinzler, Otmar Hilliges, Justus Thies
Abstract:
We present Depth-aware Image-based NEural Radiance fields (DINER). Given a sparse set of RGB input views, we predict depth and feature maps to guide the reconstruction of a volumetric scene representation that allows us to render 3D objects under novel views. Specifically, we propose novel techniques to incorporate depth information into feature fusion and efficient scene sampling. In comparison to the previous state of the art, DINER achieves higher synthesis quality and can process
input views with greater disparity. This allows us to capture scenes more completely without changing capturing hardware requirements and ultimately enables larger viewpoint changes during novel view synthesis. We evaluate our method by synthesizing novel views, both for human heads and for general objects, and observe significantly improved qualitative results and increased perceptual metrics compared to the previous state of the art.
Download the code via
git clone https://github.com/malteprinzler/diner.git
cd diner
DINER was developed and tested with Python3.8, PyTorch 1.11.0, CUDA 11.3. We recommend installing a virtual python environment by running the following commands:
python3.8 -m venv venv
source venv/bin/activate
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
This repository is accompanied by pretrained model weights and dataset split configurations. Please download the zipped files from here and extract it into the project root. The final directory tree should look like:
diner (repository root)
|- assets
| |- ckpts
| |- data_splits
| ...
data/DTU
. Use the files from Depths_raw.zip
to overwrite the files obtained from the dtu_training.rar. The folder structure after unzipping should look like the following:
data/DTU (dataset root)
|- Cameras
| |- train
| | |- 00000000_cam.txt
| | |- 00000001_cam.txt
| | |- ...
| |
| |- 00000000_cam.txt
| |- 00000001_cam.txt
| |- ...
|
|- Depths
| |- scan1
| | |- depth_map_0000.pfm
| | |- depth_map_0001.pfm
| | |- ...
| | |- depth_visual_0000.png
| | |- depth_visual_0001.png
| | |- ...
| |
| |- scan1_train
| | |- depth_map_0000.pfm
| | |- depth_map_0001.pfm
| | |- ...
| | |- depth_visual_0000.png
| | |- depth_visual_0001.png
| | |- ...
| |
| |- scan2
| |- scan2_train
| |- ...
|
|- Depths_preprocessed
| |- scan1_train
| | |- depth_map_0000.pfm
| | |- depth_map_0001.pfm
| | |- ...
| | |- depth_visual_0000.png
| | |- depth_visual_0001.png
| |- scan2_train
| |- ...
|
|- Rectified
| |- scan1_train
| | |- rect_001_0_r5000.png
| | |- rect_001_1_r5000.png
| | |- rect_001_2_r5000.png
| | |- ...
| |- scan2_train
| |- ...
```<br><br>
bash deps/TransMVSNet/scripts/write_to_dtu.sh
this will write the TransMVSNet depth predictions to files like data/DTU/Depths/scan71/depth_map_0000_TransMVSNet(_conf/_vis).png
.
To change the configurations, adapt deps/TransMVSNet/scripts/write_to_dtu.sh
according to your needs:
...
DATA_ROOT="data/DTU/" # path to dtu dataset
OUTDEPTHNAME="TransMVSNet" # prefix of the output depth files
LOG_DIR="outputs/dtu/TransMVSNet_writing"
CKPT="assets/ckpts/dtu/TransMVSNet.ckpt" # path to pretrained checkpoint
NGPUS=1
BATCH_SIZE=1
...
run
python python_scripts/create_prediction_folder.py --config configs/evaluate_diner_on_dtu.yaml --ckpt assets/ckpts/dtu/DINER.ckpt --out outputs/dtu/diner_full_evaluation
The outputs will be stored in outputs/dtu/DINER_full_evaluation
by default. Since evaluating DINER on the entire DTU validation set might take quite long, you can set the argument --n [NUMBER_OF_SAMPLES]
to evaluate DINER only on a subset of the validation set.
To change the configurations, adapt configs/evaluate_diner_on_dtu.yaml
according to your needs.
data/FACESCAPE_RAW
. After extraction, the directory structure should look like this:
- data/FACESCAPE_RAW (dataset root)
|- 1
| |- 1_neutral
| | |- 0.jpg
| | |- 1.jpg
| | |- ...
| | |- 54.jpg
| | |- params.json
| |- 1_neutral.ply
| |- 2_smile
| |- 2_smile.ply
| |- ...
| |- dpmap
| | |- 1_neutral.png
| | |- 2_smile.png
| | |- ...
| | |- ...
| |- models_reg
| | |- 1_neutral.obj
| | |- 1_neutral.jpg
| | |- 1_neutral.mtl
| | |- 2_smile.obj
| | |- 2_smile.jpg
| | |- 2_smile.mtl
| | |- ...
|- 2
|- ...
ROOT_IN="data/FACESCAPE_RAW"
ROOT_OUT="data/FACESCAPE_PROCESSED"
for i in {1..359}
do
out_i=$(printf "%03d" $i)
python deps/facescape_preprocessing/process_dataset.py --dir_in $ROOT_IN/$i --dir_out $ROOT_OUT/$out_i
done
Note that while it is possible to perform the data preprocessing sequentially as outlined above, it is highly advised to parallelize the process. Please refer to deps/facescape_preprocessing/process_dataset.sh
and deps/facescape_preprocessing/process_dataset.sub
for an exemplary implementation.
assets
folder (see instructions here).bash deps/TransMVSNet/scripts/write_to_facescape.sh
this will write the TransMVSNet depth predictions to files like data/FACESCAPE_PROCESSED/033/01/view_00020/depth_TransMVSNet(_vis/_conf).png
.
deps/TransMVSNet/scripts/write_to_facescape.sh
according to your needs.python python_scripts/create_prediction_folder.py --config configs/evaluate_diner_on_facescape.yaml --ckpt assets/ckpts/facescape/DINER.ckpt --out outputs/facescape/diner_full_evaluation
The outputs will be stored in outputs/facescape/diner_full_evaluation
by default.
configs/evaluate_diner_on_facescape.yaml
according to your needs.python deps/multiface/download_dataset.py
this will automatically download a subset of the MULTIFACE dataset as specified in configs/download_multiface.json
and store it in data/MULTIFACE
. It may take a while.
python deps/multiface/process_dataset.py --root data/MULTIFACE
assets
folder (see instructions here).bash deps/TransMVSNet/scripts/write_to_multiface.sh
this will write the TransMVSNet depth predictions to files like data/MULTIFACE/m--20190529--1004--5067077--GHS/depths/SEN_approach_your_interview_with_statuesque_composure/400349/046029_TransMVSNet(_conf/_vis).png
.
deps/TransMVSNet/scripts/write_to_multiface.sh
according to your needs.python python_scripts/create_prediction_folder.py --config configs/evaluate_diner_on_multiface.yaml --ckpt assets/ckpts/facescape/DINER.ckpt --out outputs/multiface/diner_full_evaluation
The outputs will be stored in outputs/facescape/diner_full_evaluation
by default.
To train the depth estimator from scratch, run
deps/TransMVSNet/scripts/train_TransMVSNet_dtu.sh # Training on DTU
or
deps/TransMVSNet/scripts/train_TransMVSNet_facescape.sh # Training on Facescape
.
To change the training settting, please adjust the respective *.sh
files. Note that the authors of TransMVSNet recommend training with 8 GPUs.
To train DINER from scratch, run
python python_scripts/train.py configs/train_dtu.yaml # Training on DTU
or
python python_scripts/train.py configs/train_facescape.yaml # Training on Facescape
. Note that we use one NVIDIA A100-SXM4-80GB for training.
The code is available for non-commercial scientific research purposes under the CC BY-NC 3.0 license.
If you find our work useful, please include the following citation:
@inproceedings{prinzler2023diner,
title={DINER: (D)epth-aware (I)mage-based (Ne)ural (R)adiance Fields},
author={Prinzler, Malte and Hilliges, Otmar and Thies, Justus},
booktitle = {Computer Vision and Pattern Recognition (CVPR)},
year = {2023}
}
Parts of our code are heavily inspired by https://github.com/sxyu/pixel-nerf and https://github.com/megvii-research/TransMVSNet so please consider citing their work as well.
Malte Prinzler was supported by the Max Planck ETH Center for Learning Systems (CLS) during this project.