graphdeco-inria / hierarchical-3d-gaussians

Official implementation of the SIGGRAPH 2024 paper "A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets"
Other
968 stars 93 forks source link

A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets

Bernhard Kerbl*, Andreas Meuleman*, Georgios Kopanas, Michael Wimmer, Alexandre Lanvin, George Drettakis (* indicates equal contribution)

Project page | Paper

This repository contains the official authors' implementation associated with the paper "A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets". We explain the different steps required to run our algorithm. We use a "toy example" of 1500 images organized in 2 chunks to illustrate each step of the method and facilitate reproduction. The full datasets presented in the paper will be released as soon as the data protection process is completed (please stay tuned).

Bibliography:

@Article{hierarchicalgaussians24,
      author       = {Kerbl, Bernhard and Meuleman, Andreas and Kopanas, Georgios and Wimmer, Michael and Lanvin, Alexandre and Drettakis, George},
      title        = {A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets},
      journal      = {ACM Transactions on Graphics},
      number       = {4},
      volume       = {43},
      month        = {July},
      year         = {2024},
      url          = {https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/}
}

Roadmap

Please note that the code release is currently in alpha. We intend to provide fixes for issues that are experienced by users, due to difficulties with setups and/or environments that we did not test on. The below steps were successfully tested on Windows and Ubuntu 22. We appreciate the documentation of issues by users and will try to address them. Furthermore, there are several points that we will integrate in the coming weeks:

Setup

Make sure to clone the repo using --recursive:

git clone https://github.com/graphdeco-inria/hierarchical-3d-gaussians.git --recursive
cd hierarchical-3d-gaussians

Prerequisite

We tested on Ubuntu 22.04 and Windows 11 using the following:

Python environment for optimization

conda create -n hierarchical_3d_gaussians python=3.12 -y
conda activate hierarchical_3d_gaussians
# Replace cu121 with cu118 if using CUDA 11.x 
pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu121 
pip install -r requirements.txt

Weights for monocular depth estimation

To enable depth loss, download the model weights of one of these methods:

Compiling hierarchy generator and merger

cd submodules/gaussianhierarchy
cmake . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release
cd ../..

Compiling the real-time viewer

For Ubuntu 22.04, install dependencies:

sudo apt install -y cmake libglew-dev libassimp-dev libboost-all-dev libgtk-3-dev libopencv-dev libglfw3-dev libavdevice-dev libavcodec-dev libeigen3-dev libxxf86vm-dev libembree-dev

Clone the hierarchy viewer and build:

cd SIBR_viewers
git clone https://github.com/graphdeco-inria/hierarchy-viewer.git src/projects/hierarchyviewer
cmake . -B build -DCMAKE_BUILD_TYPE=Release -DBUILD_IBR_HIERARCHYVIEWER=ON -DBUILD_IBR_ULR=OFF -DBUILD_IBR_DATASET_TOOLS=OFF -DBUILD_IBR_GAUSSIANVIEWER=OFF 
cmake --build build -j --target install --config Release

Running the method

Our method has two main stages: Reconstruction, that takes a (usually large) set of images as input and outputs a "merged hierarchy", and Runtime, that displays the full hierarchy in real-time.

Reconstruction has two main steps: 1) Preprocessing the input images and 2) Optimization. We present these in detail next. For each step we have automatic scripts that perform all the required steps, and we also provide details about the individual components.

Dataset

To get started, prepare a dataset or download and extract the toy example. The dataset should have sorted images in a folder per camera in ${DATASET_DIR}/inputs/images/ and optional masks (with .png extension) in ${DATASET_DIR}/inputs/masks/. Masks will be multiplied to the input images and renderings before computing loss.

You can also work from our full scenes. As we provide them calibrated and subdivided, you may skip to Generate monocular depth maps. The datasets:

In the following, replace ${DATASET_DIR} with the path to your dataset or set DATASET_DIR:

# Bash:
DATASET_DIR=<Path to your dataset>

# PowerShell:
${DATASET_DIR} = "<Path to your dataset>"

To skip the reconstruction and only display scenes, download pretrained hierarchies and scaffolds, place them under ${DATASET_DIR}/output/ and follow the viewer instructions. The pretrained hierarchies:

1. Preprocessing

As in 3dgs we need calibrated cameras and a point cloud to train our hierarchies on.

1.1 Calibrating the cameras

The first step is to generate a "global colmap". The following command uses COLMAP's hierarchical mapper, rectify images and masks, and align and scale the sparse reconstruction to facilitate subdivision.

python preprocess/generate_colmap.py --project_dir ${DATASET_DIR}
Using calibrated images If your dataset already has COLMAP (with 2D and 3D SfM points) and rectified images, they should be placed under `${DATASET_DIR}/camera_calibration/rectified`. As they still need alignment, run: ``` python preprocess/auto_reorient.py --input_path ${DATASET_DIR}/camera_calibration/rectified/sparse --output_path ${DATASET_DIR}/camera_calibration/aligned/sparse/0 ```


This step takes ~ 47 minutes on our example dataset using a RTX A6000, more details on each steps of the script here.

1.2 Generate chunks

Once the "global colmap" generated, it should be split into chunks. We also run a per-chunk bundle adjustment as COLMAP's hierarchical mapper is faster but less accurate (if your global colmap is accurate, you can skip this time consuming step with --skip_bundle_adjustment).

python preprocess/generate_chunks.py --project_dir ${DATASET_DIR}

This step takes ~ 95 minutes on our example dataset using a RTX A6000, more details on each steps of the script here.

note that by using --use_slurm you can refine the chunks in parallel, remember to set your slurm parameters in preprocess/prepare_chunks.slurm (gpu, account, etc ...).

1.3 Generate monocular depth maps

In order to use depth regularization when training each chunks, depth maps must be generated for each rectified image. Then, depth scaling parameters needs to be computed as well, these two steps can be done using:

python preprocess/generate_depth.py --project_dir ${DATASET_DIR}

Project structure

Now you should have the following file structure, it is required for the training part:


project
└── camera_calibration
    ├── aligned
    │   └── sparse/0
    │       ├── images.bin
    │       ├── cameras.bin
    │       └── points3D.bin
    ├── chunks
    │   ├── 0_0
    │   └── 0_1
    │   .
    │   .
    │   .
    │   └── m_n
    │       ├── center.txt
    │       ├── extent.txt
    │       └── sparse/0
    │           ├── cameras.bin
    │           ├── images.bin
    │           ├── points3d.bin
    │           └── depth_params.json
    └── rectified
        ├── images
        ├── depths
        └── masks

2. Optimization

The scene training process is divided into five steps; 1) we first train a global, coarse 3D Gaussian splatting scene ("the scaffold"), then 2) train each chunk independently in parallel, 3) build the hierarchy, 4) optimize the hierarchy in each chunk and finally 5) consolidate the chunks to create the final hierarchy.

Make sure that you correctly set up your environment and built the hierarchy merger/creator

The full_train.py script performs all these steps to train a hierarchy from a preprocessed scene. While training, the progress can be visualized with the original 3DGS remote viewer (build instructions).

python scripts/full_train.py --project_dir ${DATASET_DIR}
Command Line Arguments ### --colmap_dir Input aligned colmap. ### --images_dir Path to rectified images. ### --depths_dir Path to rectified depths. ### --masks_dir Path to rectified masks. ### --chunks_dir Path to input chunks folder. ### --env_name Name the conda env you created earlier. ### --output_dir Path to output dir. ### --use_slurm Flag to enable parallel training using slurm (`False` by default).


note that by using --use_slurm, chunks will be trained in parallel, to exploit e.g. multi-GPU setups. To control the process, remember to set your slurm parameters in coarse_train.slurm, consolidate.slurm and train_chunk.slurm (gpu, account, etc ...)

This step takes ~ 171 minutes on our example dataset using a RTX A6000, more details on each steps of the script here.

3. Real-time viewer

The real-time viewer is based on SIBR, similar to original 3DGS. For setup, please see here

Running the viewer on a merged hierarchy

The hierarchical real-time viewer is used to vizualize our trained hierarchies. It has a top view that displays the structure from motion point could as well as the input calibrated cameras in green. The hierarchy chunks are also displayed in a wireframe mode.

alt text

After installing the viewers, you may run the compiled SIBR_gaussianHierarchyViewer_app in <SIBR install dir>/bin/. Controls are described here.

If not a lot of VRAM is available, add --budget <Budget for the parameters in MB> (by default set to 16000, assuming at least 16 GB of VRAM). Note that this only defines the budget for the SCENE representation. Rendering will require some additional VRAM (up to 1.5 GB) for framebuffer structs. Note that the real-time renderer assumes that CUDA/OpenGL Interop is available on your system (see the original 3DGS documentation for more details).

The interface includes a field for tau (size limit) which defines the desired granularity setting. Note that tau = 0 will try to render the complete dataset (all leaf nodes). If the granularity setting exceeds the available VRAM budget, instead of running out of memory, the viewer will auto-regulate and raise the granularity until the scene can fit inside the defined VRAM budget.

SIBR_viewers/install/bin/SIBR_gaussianHierarchyViewer_app --path ${DATASET_DIR}/camera_calibration/aligned --scaffold ${DATASET_DIR}/output/scaffold/point_cloud/iteration_30000 --model-path ${DATASET_DIR}/output/merged.hier --images-path ${DATASET_DIR}/camera_calibration/rectified/images
Command Line Arguments for Real-Time Viewer #### --model-path / -m Path to a trained hierarchy. #### --iteration Specifies which of state to load if multiple are available. Defaults to latest available iteration. #### --path / -s Argument to override model's path to source dataset. #### --rendering-size Takes two space separated numbers to define the resolution at which real-time rendering occurs, ```1200``` width by default. Note that to enforce an aspect that differs from the input images, you need ```--force-aspect-ratio``` too. #### --images-path Path to rectified input images to be viewed in the top view. #### --device Index of CUDA device to use for rasterization if multiple are available, ```0``` by default. #### --budget Amount of VRAM memory that may be used for the hierarchical 3DGS scene representation.



Details on the different steps

Generating colmap

note that in our experiments we used colmap 3.9.1 with cuda support
the parameters of each colmap commands as well as our scripts are the ones we used in the example dataset.
More details on these parameters can be found here

Generating chunks

The last preprocessing step is to divide the colmap into chunks, each chunk will have its own colmap that will be refined with two rounds of bundle adjustment and triangulation:

Monocular depth maps

Make sure to have the depth estimator weights.

  1. Generate depth maps (should run for each subfolder in images/)

    • Using Depth Anything V2 (prefered):
      cd submodules/Depth-Anything-V2
      python run.py --encoder vitl --pred-only --grayscale --img-path [path_to_input_images_dir] --outdir [path_to_output_depth_dir]
    • Using DPT:
      cd submodules/DPT
      python run_monodepth.py -t dpt_large -i [path_to_input_images_dir] -o [path_to_output_depth_dir]
  2. Generate depth_params.json file from the depth maps created on step 1.

    this file will be used for the depth regularization for single chunk training. It needs to be generated for each chunk.

      cd ../../
      python preprocess/make_depth_scale.py --base_dir [path to colmap] --depths_dir [path to output depth dir]

Training steps

Make sure that you correctly set up repositories and environments

Slurm parameters

The beginning of each .slurm script must have the following parameters:

#!/bin/bash

#SBATCH --account=xyz@v100      # your slurm account (ex: xyz@v100)
#SBATCH --constraint=v100-32g   # the gpu you require (ex: v100-32g)
#SBATCH --ntasks=1              # number of process you require
#SBATCH --nodes=1               # number of nodes you require 
#SBATCH --gres=gpu:1            # number of gpus you require
#SBATCH --cpus-per-task=10      # number of cpus per task you require
#SBATCH --time=01:00:00         # maximal allocation time

Note that the slurm scripts have not been thouroughly tested.

Evaluations

We use a test.txt file that is read by the dataloader and splits into train/test sets when --eval is passed to the training scripts. This file should be present in sprase/0/ for each chunk and for the aligned "global colmap" (if applicable).

Single chunk

The single chunks we used for evaluation:

To run the evaluations on a chunk:

python train_single.py -s ${CHUNK_DIR} --model_path ${OUTPUT_DIR} -d depths --exposure_lr_init 0.0 --eval --skip_scale_big_gauss

# Windows: build/Release/GaussianHierarchyCreator 
submodules/gaussianhierarchy/build/GaussianHierarchyCreator ${OUTPUT_DIR}/point_cloud/iteration_30000/point_cloud.ply ${CHUNK_DIR}  ${OUTPUT_DIR} 

python train_post.py -s ${CHUNK_DIR} --model_path ${OUTPUT_DIR} --hierarchy ${OUTPUT_DIR}/hierarchy.hier --iterations 15000 --feature_lr 0.0005 --opacity_lr 0.01 --scaling_lr 0.001 --eval

python render_hierarchy.py -s ${CHUNK_DIR} --model_path ${OUTPUT_DIR} --hierarchy ${OUTPUT_DIR}/hierarchy.hier_opt --out_dir ${OUTPUT_DIR} --eval

Large scenes

Ensure that the test.txt is present in all sparse/0/ folders. preprocess/copy_file_to_chunks.py can help copying it to each chunk. Then, the scene can be optimized with eval:

python scripts/full_train.py --project_dir ${DATASET_DIR} --extra_training_args '--exposure_lr_init 0.0 --eval'

The following renders the test set from the optimized hierarchy. Note that the current implementation loads the full hierarchy in GPU memory.

python render_hierarchy.py -s ${DATASET_DIR} --model_path ${DATASET_DIR}/output --hierarchy ${DATASET_DIR}/output/merged.hier --out_dir ${DATASET_DIR}/output/renders --eval --scaffold_file ${DATASET_DIR}/output/scaffold/point_cloud/iteration_30000

Exposure optimization

We generally disable exposure optimization for evaluations. If you want to use it, you can optimize exposure on the left half of the test image and evaluate on their right half. To achieve this, remove --exposure_lr_init 0.0 from the commands above and add --train_test_exp to all training scripts.