hailanyi / VirConv

Virtual Sparse Convolution for Multimodal 3D Object Detection
https://arxiv.org/abs/2303.02314
Apache License 2.0
276 stars 39 forks source link
3d-object-detection kitti multimodal point-clouds

Virtual Sparse Convolution for Multimodal 3D Object Detection

This is a official code release of VirConv (Virtual Sparse Convolution for 3D Object Detection). This code is mainly based on OpenPCDet, some codes are from TED, CasA, PENet and SFD.

Detection Framework

The detection frameworks are shown below.

Model Zoo

We release three models: VirConv-L, VirConv-T and VirConv-S.

Important notes:

Train multiple times on 8xV100 and choose the best:

Environment Detector GPU (train) Easy Mod. Hard download
Spconv1.2 VirConv-L ~7 GB 93.08 88.51 86.69 google / baidu(05u2) / 51M
Spconv1.2 VirConv-T ~13 GB 94.58 89.87 87.78 google / baidu(or81) / 55M
Spconv1.2 VirConv-S ~13 GB 95.67 91.09 89.09 google / baidu(ak74) / 62M

Train multiple times on 8xV100 and choose the best:

Environment Detector GPU (train) Easy Mod. Hard download
Spconv2.1 VirConv-L ~7 GB 93.18 88.23 85.48 google / baidu(k2dp) / 51M
Spconv2.1 VirConv-T ~13 GB 94.91 90.36 88.10 google / baidu(a4r4) / 56M
Spconv2.1 VirConv-S ~13 GB 95.76 90.91 88.61 google / baidu(j3mi) / 56M

Getting Started

conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator

Dependency

Our released implementation is tested on.

We also tested on.

We also tested on.

Prepare dataset

You must creat additional semi dataset and velodyne_depth dataset to run our multimodal and semi-supervised detectors.

Please download the official KITTI 3D object detection dataset, KITTI odometry dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):

VirConv
├── data
│   ├── odometry
│   │   │── 00
│   │   │── 01
│   │   │   │── image_2
│   │   │   │── velodyne
│   │   │   │── calib.txt
│   │   │── ...
│   │   │── 21
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2
├── pcdet
├── tools

(1) Creat semi dataset from odometry dataset.

cd tools
python3 creat_semi_dataset.py ../data/odometry ../data/kitti/semi

(2) Download the pseudo labels generated by VirConv-T from here (fuse detections from last 10 checkpoints by WBF and filter low quality detections by a 0.9 score threshold) and put it into kitti/semi.

(3) Download the PENet depth completion model from google (500M) or baidu (gp68), and put it into tools/PENet.

(4) Then run the following code to generate RGB virtual points.

cd tools/PENet
python3 main.py --detpath ../../data/kitti/training
python3 main.py --detpath ../../data/kitti/testing
python3 main.py --detpath ../../data/kitti/semi

(5) After that, run following command to creat dataset infos:

python3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
python3 -m pcdet.datasets.kitti.kitti_datasetsemi create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml

Anyway, the data structure should be:

VirConv
├── data
│   ├── kitti
│   │   │── ImageSets
│   │   │── training
│   │   │   ├──calib & velodyne & label_2 & image_2 & (optional: planes) & velodyne_depth
│   │   │── testing
│   │   │   ├──calib & velodyne & image_2 & velodyne_depth
│   │   │── semi (optional)
│   │   │   ├──calib & velodyne & label_2(pseudo label) & image_2 & velodyne_depth
│   │   │── gt_database_mm
│   │   │── gt_databasesemi
│   │   │── kitti_dbinfos_trainsemi.pkl
│   │   │── kitti_dbinfos_train_mm.pkl
│   │   │── kitti_infos_test.pkl
│   │   │── kitti_infos_train.pkl
│   │   │── kitti_infos_trainsemi.pkl
│   │   │── kitti_infos_trainval.pkl
│   │   │── kitti_infos_val.pkl
├── pcdet
├── tools

Setup

cd VirConv
python setup.py develop

Training.

For training the VirConv-L and VirConv-T:

Single GPU train:

cd tools
python3 train.py --cfg_file ${CONFIG_FILE}

For example, if you train the VirConv-L model:

cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-L.yaml

Multiple GPU train:

You can modify the gpu number in the dist_train.sh and run

cd tools
sh dist_train.sh

The log infos are saved into log.txt You can run cat log.txt to view the training process.

For training the VirConv-S:

You should firstly train a VirConv-T:

cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-T.yaml

Then train the VirConv-S:

cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --pretrained_model ../output/models/kitti/VirConv-T/default/ckpt/checkpoint_epoch_40.pth

Evaluation.

cd tools
python3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}

For example, if you test the VirConv-S model:

cd tools
python3 test.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --ckpt VirConv-S.pth

Multiple GPU test: you should modify the gpu number in the dist_test.sh and run

sh dist_test.sh 

The log infos are saved into log-test.txt You can run cat log-test.txt to view the test results.

License

This code is released under the Apache 2.0 license.

Acknowledgement

TED

CasA

OpenPCDet

PENet

SFD

Citation

@inproceedings{VirConv,
    title={Virtual Sparse Convolution for Multimodal 3D Object Detection},
    author={Wu, Hai and Wen,Chenglu and Shi, Shaoshuai and Wang, Cheng},
    booktitle={CVPR},
    year={2023}
}