GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Paper | Project Page

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng$\dagger$, Yunpeng Zhang, Jie Zhou, Jiwen Lu$\ddagger$

$\dagger$ Project leader $\ddagger$ Corresponding author

💥A pioneering step towards building an object-centric autonomous driving system. 💥

GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.

teaser

News.

[2024/09/30] Occupancy and Gaussian visualization code release.
[2024/09/12] Training code release.
[2024/09/05] An updated version of GaussianFormer modeling only the occupied area.
[2024/09/05] Model weights and evaluation code release.
[2024/07/01] GaussianFormer is accepted to ECCV24!
[2024/05/28] Paper released on arXiv.
[2024/05/28] Demo release.

Demo

demo

legend

Overview

comparisons

Considering the universal approximating ability of Gaussian mixture, we propose an object-centric 3D semantic Gaussian representation to describe the fine-grained structure of 3D scenes without the use of dense grids. We propose a GaussianFormer model consisting of sparse convolution and cross-attention to efficiently transform 2D images into 3D Gaussian representations. To generate dense 3D occupancy, we design a Gaussian-to-voxel splatting module that can be efficiently implemented with CUDA. With comparable performance, our GaussianFormer reduces memory consumption of existing 3D occupancy prediction methods by 75.2% - 82.2%.

overview

Getting Started

Installation

Follow instructions HERE to prepare the environment.

Data Preparation

Download nuScenes V1.0 full dataset data HERE.
Download the occupancy annotations from SurroundOcc HERE and unzip it.
Download pkl files HERE.

Folder structure

GaussianFormer
├── ...
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── v1.0-test/
|   |   ├── v1.0-trainval/
│   ├── nuscenes_cam/
│   │   ├── nuscenes_infos_train_sweeps_occ.pkl
│   │   ├── nuscenes_infos_val_sweeps_occ.pkl
│   ├── surroundocc/
│   │   ├── samples/
│   │   |   ├── xxxxxxxx.pcd.bin.npy
│   │   |   ├── ...

Inference

We provide two checkpoints trained on the SurroundOcc dataset:

The checkpoint that reproduces the result in Table.1 of our paper.
🔥🔥An updated version of GaussianFormer which assigns semantic Gaussians to model only the occupied area while leaving the empty space to one fixed infinitely large Gaussian. This modification can significant reduce the number of Gaussians to achieve similar model capacity (144000 -> 25600), thus being even more efficient. Check our GaussianHead for more details.

python eval.py --py-config config/nuscenes_gs144000.py --work-dir out/nuscenes_gs144000/ --resume-from out/nuscenes_gs144000/state_dict.pth

python eval.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid/ --resume-from out/nuscenes_gs25600_solid/state_dict.pth

Train

Run the following command to launch your training process. Note that the setting with 144000 Gaussians requires ~40G GPU memory in the training phase. So we recommend trying out the 25600 version which achieves even better performance!🚀

Download the pretrained weights for the image backbone HERE and put it inside ckpts.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid

Config	mIoU	Log	Weight
nuscenes_gs25600_solid	19.31	log	weight

Stay tuned for more exciting work and models!🤗

Visualize

Install packages for visualization according to the documentation. Here is an example command where you can change --num-samples and --vis-index.

CUDA_VISIBLE_DEVICES=0 python visualize.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid --resume-from out/nuscenes_gs25600_solid/state_dict.pth --vis-occ --vis-gaussian --num-samples 3

Related Projects

Our work is inspired by these excellent open-sourced repos: TPVFormer PointOcc SelfOcc SurroundOcc OccFormer BEVFormer

Our code is originally based on Sparse4D and migrated to the general framework of SelfOcc.

Citation

If you find this project helpful, please consider citing the following paper:

@article{huang2024gaussian,
    title={GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction},
    author={Huang, Yuanhui and Zheng, Wenzhao and Zhang, Yunpeng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:2405.17429},
    year={2024}
}

huang-yh / GaussianFormer

readme