youmi-zym / GO-SLAM

[ICCV2023] GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction
Apache License 2.0
344 stars 30 forks source link

GO-SLAM
Global Optimization for Consistent 3D Instant Reconstruction
(ICCV 2023)

:rotating_light: This repository contains the code and trained models of our work "GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction", ICCV 2023

by Youmin Zhang, Fabio Tosi, Stefano Mattoccia and Matteo Poggi

Department of Computer Science and Engineering (DISI), University of Bologna

Note: 🚧 Kindly note that this repository is currently in the development phase.

Code is available now, enjoy!

[Project Page](https://youmi-zym.github.io/projects/GO-SLAM/) | [Paper & Supplementary](https://arxiv.org/pdf/2309.02436.pdf)

3D Reconstruction Comparison

**3D Reconstruction and Trajectory Error**. From left to right: RGB-D methods ([iMAP](https://arxiv.org/abs/2103.12352), [NICE-SLAM](https://github.com/cvg/nice-slam), [DROID-SLAM](https://github.com/princeton-vl/DROID-SLAM), and ours), ground truth scan, and monocular methods (DROID-SLAM and ours). ## :bookmark_tabs: Table of Contents 1. [Introduction](#clapper-introduction) 2. [Code](#memo-code) 3. [Qualitative Results](#art-qualitative-results) 4. [Contacts](#envelope-contacts)

:clapper: Introduction

We introduce GO-SLAM, a deep-learning-based dense visual SLAM framework that achieves real-time global optimization of poses and 3D reconstruction. By integrating robust pose estimation, efficient loop closing, and continuous surface representation updates, GO-SLAM effectively addresses the error accumulation and distortion challenges associated with neural implicit representations. Through the utilization of learned global geometry from input history, GO-SLAM sets new benchmarks in tracking robustness and reconstruction accuracy across synthetic and real-world datasets. Notably, its versatility encompasses monocular, stereo, and RGB-D inputs..

Contributions:

Architecture Overview

GO-SLAM consists of three parallel threads: front-end tracking, back-end tracking, and instant mapping. It can run with monocular, stereo, and RGB-D input.

Alt text

:fountain_pen: If you find this code useful in your research, please cite:

@inproceedings{zhang2023goslam,
    author    = {Zhang, Youmin and Tosi, Fabio and Mattoccia, Stefano and Poggi, Matteo},
    title     = {GO-SLAM: Global Optimization for Consistent 3D Instant Reconstruction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
}

:memo: Code

You can create an anaconda environment called go-slam. For linux, you need to install libopenexr-dev before creating the environment.


git clone --recursive https://github.com/youmi-zym/GO-SLAM

sudo apt-get install libopenexr-dev

conda env create -f environment.yaml
conda activate go-slam

pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install evo --upgrade --no-binary evo

python setup.py install

Replica

Download the data from Google Drive, and then you can run:

# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [rgbd, mono], EXP_NAME is the experimental name you want

./evaluate_on_replica.sh MODE EXP_NAME

# for example

./evaluate_on_replica.sh rgbd first_try

Mesh and corresponding evaluated metrics are available in OUT_DIR.

We also upload our predicted mesh on Google Drive. Enjoy!

ScanNet

Please follow the data downloading procedure on ScanNet website, and extract color/depth frames from the .sens file using this code.

[Directory structure of ScanNet (click to expand)] DATAROOT is `./Datasets` by default. If a sequence (`sceneXXXX_XX`) is stored in other places, please change the `input_folder` path in the config file or in the command line. ``` DATAROOT └── ScanNet └── scans └── scene0000_00 └── frames ├── color │ ├── 0.jpg │ ├── 1.jpg │ ├── ... │ └── ... ├── depth │ ├── 0.png │ ├── 1.png │ ├── ... │ └── ... ├── intrinsic └── pose ├── 0.txt ├── 1.txt ├── ... └── ... ```

Once the data is downloaded and set up properly, you can run:

# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [rgbd, mono], EXP_NAME is the experimental name you want

./evaluate_on_scannet.sh MODE EXP_NAME

# for example

./evaluate_on_scannet.sh rgbd first_try

# besides, you can generate video as shown in our project page by:

./generate_video_on_scannet.sh rgbd first_try_on_video

We also upload our predicted mesh on Google Drive. Enjoy!

EuRoC

Please use the following script to download the EuRoC dataset. The GT trajectory can be downloaded from Google Drive.

Please put the GT trajectory of each scene to the corresponding folder, as shown below:

[Directory structure of EuRoC (click to expand)] DATAROOT is `./Datasets` by default. If a sequence (e.g., `MH_01_easy`) is stored in other places, please change the `input_folder` path in the config file or in the command line. ``` DATAROOT └── EuRoC └── MH_01_easy └── mav0 ├── cam0 ├── cam1 ├── imu0 ├── leica0 ├── state_groundtruth_estimate0 └── body.yaml └── MH_01_easy.txt ```

Then you can run:

# for data downloading:

DATA_ROOT=path/to/folder
mkdir $DATA_ROOT
./scripts/download_euroc.sh $DATA_ROOT

# please modify the OUT_DIR firstly in the script, and also DATA_ROOT in the config file
# MODE can be [stereo, mono], EXP_NAME is the experimental name you want

./evaluate_on_euroc.sh MODE EXP_NAME

# for example

./evaluate_on_euroc.sh stereo first_try

:art: Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

Qualitative results on ScanNet dataset. We evaluate our RGB-D mode SLAM using the ScanNet dataset and benchmark it against state-of-the-art techniques. Our method showcases improved global-consistency in reconstruction results.

Qualitative results on Replica dataset. Supporting both Monocular and RGB-D modes, our GO-SLAM is evaluated on the Replica dataset. It achieves real-time, high-quality 3D reconstruction from monocular or RGB-D input. This stands in contrast to NICE-SLAM, designed solely for depth input, which operates at a frame rate of less than 1 per second and requires hours to achieve comparable outcomes.

Qualitatives examples of LC and full BA on scene0054 00 (ScanNet) with a total of 6629 frames. . In (a), a significant error accumulates when no global optimization is available. With loop closing (b), the system is able to eliminate the trajectory error using global geometry. Additionally, online full BA optimizes (c) the poses of all existing keyframes. The final model (d), which integrates both loop closing and full BA, achieves a more complete and accurate 3D model prediction.

:envelope: Contacts

For questions, please send an email to youmin.zhang2@unibo.it, fabio.tosi5@unibo.it or m.poggi@unibo.it

:pray: Acknowledgements

We sincerely thank the scholarship supported by China Scholarship Council (CSC).

We adapted some codes from some awesome repositories including NICE-SLAM, NeuS and DROID-SLAM.