LiDAR Diffusion Models [CVPR 2024]

[**Haoxi Ran**](https://hancyran.github.io/) · [**Vitor Guizilini**](https://scholar.google.com.br/citations?user=UH9tP6QAAAAJ&hl=en) · [**Yue Wang**](https://yuewang.xyz/)

:tada: News :tada:

[Apr 14, 2024] Pretrained autoencoders and LiDMs for different tasks are released!
[Apr 5, 2024] Our codebase and a detailed study of our autoencoder design along with the pretrained models is released!

Requirements

We provide an available conda environment named lidar_diffusion:

sh init/create_env.sh
conda activate lidar_diffusion

Evaluation Toolbox

Overview of evaluation metrics:

Perceptual Metrics (generation & reconstruction)			Statistical Metrics (generation only)		Distance metrics (reconstruction only)
FRID	FSVD	FPVD	JSD	MMD	CD	EMD

To standardize the evaluation of LiDAR generative models, we provide a self-contained and mostly CUDA-accelerated evaluation toolbox in the directory ./lidm/eval/. It implements and integrates various evaluation metrics, including:

Perceptual metrics:
- Fréchet Range Image Distance (FRID)
- Fréchet Sparse Volume Distance (FSVD)
- Fréchet Point-based Volume Distance (FPVD)
Statistical metrics:
- Minimum Matching Distance (MMD)
- Jensen-Shannon Divergence (JSD)
Statistical pairwise metrics (for reconstruction only):
- Chamfer Distance (CD)
- Earth Mover's Distance (EMD)

For more details about setup and usage, please refer to the Evaluation Toolbox README.

Model Zoo

To test different tasks below, please download the pretrained LiDM and its corresponding autoencoder:

Pretrained Autoencoders

64-beam (evaluated on KITTI-360 val):

Encoder	rFRID(↓)	rFSVD(↓)	rFPVD(↓)	CD(↓)	EMD(↓)	Checkpoint	Rec. Results val (Point Cloud)	Comment
f_c2_p4	2.15	20.2	16.2	0.160	0.203	[Google Drive] (205MB)	[Video]
f_c2_p4*	2.06	20.3	15.7	0.092	0.176	[Google Drive] (205MB)	[Video]	*: w/o logarithm scaling

Benchmark for Unconditional LiDAR Generation

64-beam (2k samples):

Method	Encoder	FRID(↓)	FSVD(↓)	FPVD(↓)	JSD(↓)	MMD (10^-4,↓)	Checkpoint	Output LiDAR Point Clouds
LiDAR-GAN		1222	183.4	168.1	0.272	4.74	-	[2k samples]
LiDAR-VAE		199.1	129.9	105.8	0.237	7.07	-	[2k samples]
ProjectedGAN		149.7	44.7	33.4	0.188	2.88	-	[2k samples]
UltraLiDAR§		370.0	72.1	66.6	0.747	17.12	-	[2k samples]
LiDARGen (1160s)†		129.0	39.2	33.4	0.188	2.88	-	[2k samples]

LiDARGen (50s)†		2051	480.6	400.7	0.506	9.91	-	[2k samples]
LiDM (50s)	f_c2_p4	135.8	37.9	28.7	0.211	3.87	[Google Drive] (3.9GB)	[2k samples]
LiDM (50s)	f_c2_p4*	125.1	38.8	29.0	0.211	3.84	[Google Drive] (3.9GB)	[2k samples]

NOTE:

Each method is evaluated with 2,000 randomly generated samples.
†: samples generated by the officially released pretrained model in LiDARGen github repo.
§: samples borrowed from UltraLiDAR implementation.
All above results are calculated from our evaluation toolbox. For more details, please refer to Evaluation Toolbox README.
Each .pcd file is a list of point clouds stored by joblib package. To load those files, use command joblib.load(path).

To evaluate above methods (except LiDM) yourself, download our provided .pcd files in the Output column to directory ./models/baseline/kitti/[method]/:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/baseline/kitti/[method]/samples.pcd --baseline --eval

To evaluate LiDM through the given .pcd files:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/lidm/kitti/[method]/samples.pcd --eval

Pretrained LiDMs for Other Tasks

Task	Encoder	Dataset	FRID(↓)	FSVD(↓)	Checkpoint	Output
Semantic Map to LiDAR	f_c2_p4*	SemanticKITTI	11.8	19.1	[Google Drive] (3.9GB)	[log.tar.gz] (2.1GB)
Camera to LiDAR	f_c2_p4*	KITTI-360	38.9	32.1	[Google Drive] (7.5GB)	[log.tar.gz] (5.4GB)
Text to LiDAR	f_c2_p4*	zero-shot	-	-	From Camera-to-LiDAR	-

NOTE:

The output log.tar.gz contains input conditions (.png), generated range images (.png), generated point clouds (.txt), and a collection of all output point clouds (.pcd).

Study on Design of LiDAR Compression

For full details of our studies on the design of LiDAR Compression, please refer to LiDAR Compression Design README.

Tip: Download the video instead of watching it with the Google Drive's built-in video player provides a better visualization.

Autoencoders (trained with 40k steps, evaluated on reconstruction):

Curvewise Factor	Patchwise Factor	Output Size	rFRID(↓)	rFSVD(↓)	#Params (M)	Visualization of Reconstruction (val)
N/A	N/A	Ground Truth	-	-	-	[Range Image], [Point Cloud]

4	1	64x256x2	0.2	12.9	9.52	[Range Image], [Point Cloud]
8	1	64x128x3	0.9	21.2	10.76	[Range Image], [Point Cloud]
16	1	64x64x4	2.8	31.1	12.43	[Range Image], [Point Cloud]
32	1	64x32x8	16.4	49.0	13.72	[Range Image], [Point Cloud]

1	2	32x512x2	1.5	25.0	2.87	[Range Image], [Point Cloud]
1	4	16x256x4	0.6	15.4	12.45	[Range Image], [Point Cloud]
1	8	8x128x16	17.7	35.7	15.78	[Range Image], [Point Cloud]
1	16	4x64x64	37.1	68.7	16.25	[Range Image], [Point Cloud]

2	2	32x256x3	0.4	11.2	13.09	[Range Image], [Point Cloud]
4	2	32x128x4	3.9	19.6	14.35	[Range Image], [Point Cloud]
8	2	32x64x8	8.0	25.3	16.06	[Range Image], [Point Cloud]
16	2	32x32x16	21.5	54.2	17.44	[Range Image], [Point Cloud]
2	4	16x128x8	2.5	16.9	15.07	[Range Image], [Point Cloud]
4	4	16x128x16	13.8	29.5	16.86	[Range Image], [Point Cloud]

Unconditional LiDAR Generation

To run sampling on pretrained models (and to evaluate your results with flag "--eval"), firstly download our provided pretrained autoencoders to directory ./models/first_stage_models/kitti/[model_name] and pretrained LiDMs to directory ./models/lidm/kitti/[model_name]:

CUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -r models/lidm/kitti/[model_name]/model.ckpt -n 2000 --eval

Semantic-Map-to-LiDAR

To check the conditional results on a full sequence of semantic maps (sequence '08'), please refer to this video

Before run this task, set up the SemanticKITTI dataset first for semantic labels as input.

To run sampling on pretrained models (and to evaluate your results with flag "--eval"):

CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]

Camera-to-LiDAR

Before run this task, set up the KITTI-360 dataset first for camera images as input.

To run sampling on pretrained models:

CUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]

Text-to-LiDAR

To run sampling on pretrained models:

CUDA_VISIBLE_DEVICES=0 python scripts/text2lidar.py -r models/lidm/kitti/cam2lidar/model.ckpt -d kitti -p "an empty road with no object"

Training

Besides, to train your own LiDAR Diffusion Models, just run this command (for example, train both autoencoder and lidm on four gpus):

# train an autoencoder
python main.py -b configs/autoencoder/kitti/autoencoder_c2_p4.yaml -t --gpus 0,1,2,3

# train an LiDM
python main.py -b configs/lidar_diffusion/kitti/uncond_c2_p4.yaml -t --gpus 0,1,2,3

To debug the training process, just add flag -d:

python main.py -b path/to/your/config.yaml -t --gpus 0, -d

To resume your training from an existing log directory or an existing checkpoint file, use the flag -r:

# using a log directory
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/log

# or, using a checkpoint 
python main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/ckpt/file

Acknowledgement

Our codebase for the diffusion models builds heavily on Latent Diffusion

Citation

If you find this project useful in your research, please consider citing:

@inproceedings{ran2024towards,
    title={Towards Realistic Scene Generation with LiDAR Diffusion Models},
    author={Ran, Haoxi and Guizilini, Vitor and Wang, Yue},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2024}
}

hancyran / LiDAR-Diffusion

readme