hancyran / LiDAR-Diffusion

[CVPR 2024] Official implementation of "Towards Realistic Scene Generation with LiDAR Diffusion Models"
https://lidar-diffusion.github.io/
MIT License
188 stars 11 forks source link
controllable-generation diffusion-models lidar lidar-diffusion-models lidar-generation range-image text-to-lidar

LiDAR Diffusion Models [CVPR 2024]

[**Haoxi Ran**](https://hancyran.github.io/) · [**Vitor Guizilini**](https://scholar.google.com.br/citations?user=UH9tP6QAAAAJ&hl=en) · [**Yue Wang**](https://yuewang.xyz/) PDF arXiv Project Video Paper BibTex

:tada: News :tada:

Requirements

We provide an available conda environment named lidar_diffusion:

sh init/create_env.sh
conda activate lidar_diffusion

Evaluation Toolbox

Overview of evaluation metrics:

Perceptual Metrics
(generation & reconstruction)
Statistical Metrics
(generation only)
Distance metrics
(reconstruction only)
FRID FSVD FPVD JSD MMD CD EMD


To standardize the evaluation of LiDAR generative models, we provide a self-contained and mostly CUDA-accelerated evaluation toolbox in the directory ./lidm/eval/. It implements and integrates various evaluation metrics, including:

For more details about setup and usage, please refer to the Evaluation Toolbox README.

Model Zoo

To test different tasks below, please download the pretrained LiDM and its corresponding autoencoder:

Pretrained Autoencoders

64-beam (evaluated on KITTI-360 val):

Encoder rFRID(↓) rFSVD(↓) rFPVD(↓) CD(↓) EMD(↓) Checkpoint Rec. Results val
(Point Cloud)
Comment
f_c2_p4 2.15 20.2 16.2 0.160 0.203 [Google Drive]
(205MB)
[Video]
f_c2_p4* 2.06 20.3 15.7 0.092 0.176 [Google Drive]
(205MB)
[Video] *: w/o logarithm scaling

Benchmark for Unconditional LiDAR Generation

64-beam (2k samples):

Method Encoder FRID(↓) FSVD(↓) FPVD(↓) JSD(↓) MMD
(10^-4,↓)
Checkpoint Output LiDAR
Point Clouds
LiDAR-GAN 1222 183.4 168.1 0.272 4.74 - [2k samples]
LiDAR-VAE 199.1 129.9 105.8 0.237 7.07 - [2k samples]
ProjectedGAN 149.7 44.7 33.4 0.188 2.88 - [2k samples]
UltraLiDAR§ 370.0 72.1 66.6 0.747 17.12 - [2k samples]
LiDARGen (1160s)† 129.0 39.2 33.4 0.188 2.88 - [2k samples]
LiDARGen (50s)† 2051 480.6 400.7 0.506 9.91 - [2k samples]
LiDM (50s) f_c2_p4 135.8 37.9 28.7 0.211 3.87 [Google Drive]
(3.9GB)
[2k samples]
LiDM (50s) f_c2_p4* 125.1 38.8 29.0 0.211 3.84 [Google Drive]
(3.9GB)
[2k samples]

NOTE:

  1. Each method is evaluated with 2,000 randomly generated samples.
  2. †: samples generated by the officially released pretrained model in LiDARGen github repo.
  3. §: samples borrowed from UltraLiDAR implementation.
  4. All above results are calculated from our evaluation toolbox. For more details, please refer to Evaluation Toolbox README.
  5. Each .pcd file is a list of point clouds stored by joblib package. To load those files, use command joblib.load(path).

To evaluate above methods (except LiDM) yourself, download our provided .pcd files in the Output column to directory ./models/baseline/kitti/[method]/:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/baseline/kitti/[method]/samples.pcd --baseline --eval

To evaluate LiDM through the given .pcd files:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/lidm/kitti/[method]/samples.pcd --eval

Pretrained LiDMs for Other Tasks

Task Encoder Dataset FRID(↓) FSVD(↓) Checkpoint Output
Semantic Map to LiDAR f_c2_p4* SemanticKITTI 11.8 19.1 [Google Drive]
(3.9GB)
[log.tar.gz]
(2.1GB)
Camera to LiDAR f_c2_p4* KITTI-360 38.9 32.1 [Google Drive]
(7.5GB)
[log.tar.gz]
(5.4GB)
Text to LiDAR f_c2_p4* zero-shot - - From Camera-to-LiDAR -

NOTE:

  1. The output log.tar.gz contains input conditions (.png), generated range images (.png), generated point clouds (.txt), and a collection of all output point clouds (.pcd).

Study on Design of LiDAR Compression

For full details of our studies on the design of LiDAR Compression, please refer to LiDAR Compression Design README.

Tip: Download the video instead of watching it with the Google Drive's built-in video player provides a better visualization.

Autoencoders (trained with 40k steps, evaluated on reconstruction):

Curvewise
Factor
Patchwise
Factor
Output
Size
rFRID(↓) rFSVD(↓) #Params (M) Visualization of Reconstruction (val)
N/A N/A Ground Truth - - - [Range Image][Point Cloud]
4 1 64x256x2 0.2 12.9 9.52 [Range Image][Point Cloud]
8 1 64x128x3 0.9 21.2 10.76 [Range Image][Point Cloud]
16 1 64x64x4 2.8 31.1 12.43 [Range Image][Point Cloud]
32 1 64x32x8 16.4 49.0 13.72 [Range Image][Point Cloud]
1 2 32x512x2 1.5 25.0 2.87 [Range Image][Point Cloud]
1 4 16x256x4 0.6 15.4 12.45 [Range Image][Point Cloud]
1 8 8x128x16 17.7 35.7 15.78 [Range Image][Point Cloud]
1 16 4x64x64 37.1 68.7 16.25 [Range Image][Point Cloud]
2 2 32x256x3 0.4 11.2 13.09 [Range Image][Point Cloud]
4 2 32x128x4 3.9 19.6 14.35 [Range Image][Point Cloud]
8 2 32x64x8 8.0 25.3 16.06 [Range Image][Point Cloud]
16 2 32x32x16 21.5 54.2 17.44 [Range Image][Point Cloud]
2 4 16x128x8 2.5 16.9 15.07 [Range Image][Point Cloud]
4 4 16x128x16 13.8 29.5 16.86 [Range Image][Point Cloud]

Unconditional LiDAR Generation

To run sampling on pretrained models (and to evaluate your results with flag "--eval"), firstly download our provided pretrained autoencoders to directory ./models/first_stage_models/kitti/[model_name] and pretrained LiDMs to directory ./models/lidm/kitti/[model_name]:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -r models/lidm/kitti/[model_name]/model.ckpt -n 2000 --eval

Semantic-Map-to-LiDAR

To check the conditional results on a full sequence of semantic maps (sequence '08'), please refer to this video

Before run this task, set up the SemanticKITTI dataset first for semantic labels as input.

To run sampling on pretrained models (and to evaluate your results with flag "--eval"):

CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]

Camera-to-LiDAR

Before run this task, set up the KITTI-360 dataset first for camera images as input.

To run sampling on pretrained models:

CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]

Text-to-LiDAR

To run sampling on pretrained models:

CUDA_VISIBLE_DEVICES=0 python scripts/text2lidar.py -r models/lidm/kitti/cam2lidar/model.ckpt -d kitti -p "an empty road with no object"

Training

Besides, to train your own LiDAR Diffusion Models, just run this command (for example, train both autoencoder and lidm on four gpus):

# train an autoencoder
python main.py -b configs/autoencoder/kitti/autoencoder_c2_p4.yaml -t --gpus 0,1,2,3

# train an LiDM
python main.py -b configs/lidar_diffusion/kitti/uncond_c2_p4.yaml -t --gpus 0,1,2,3

To debug the training process, just add flag -d:

python main.py -b path/to/your/config.yaml -t --gpus 0, -d

To resume your training from an existing log directory or an existing checkpoint file, use the flag -r:

# using a log directory
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/log

# or, using a checkpoint 
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/ckpt/file

Acknowledgement

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{ran2024towards,
    title={Towards Realistic Scene Generation with LiDAR Diffusion Models},
    author={Ran, Haoxi and Guizilini, Vitor and Wang, Yue},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2024}
}