This is a official code release of VirConv (Virtual Sparse Convolution for 3D Object Detection). This code is mainly based on OpenPCDet, some codes are from TED, CasA, PENet and SFD.
The detection frameworks are shown below.
We release three models: VirConv-L, VirConv-T and VirConv-S.
The VirConv-L and VirConv-T are trained with train split (3712 samples) of KITTI dataset.
The VirConv-S is trained with train split (3712 samples) and unlabeled odometry split (semi split 10888 sample) of KITTI dataset.
The results are the 3D AP(R40) of Car on the val set of KITTI dataset.
Important notes:
Train multiple times on 8xV100 and choose the best:
Environment | Detector | GPU (train) | Easy | Mod. | Hard | download |
---|---|---|---|---|---|---|
Spconv1.2 | VirConv-L | ~7 GB | 93.08 | 88.51 | 86.69 | google / baidu(05u2) / 51M |
Spconv1.2 | VirConv-T | ~13 GB | 94.58 | 89.87 | 87.78 | google / baidu(or81) / 55M |
Spconv1.2 | VirConv-S | ~13 GB | 95.67 | 91.09 | 89.09 | google / baidu(ak74) / 62M |
Train multiple times on 8xV100 and choose the best:
Environment | Detector | GPU (train) | Easy | Mod. | Hard | download |
---|---|---|---|---|---|---|
Spconv2.1 | VirConv-L | ~7 GB | 93.18 | 88.23 | 85.48 | google / baidu(k2dp) / 51M |
Spconv2.1 | VirConv-T | ~13 GB | 94.91 | 90.36 | 88.10 | google / baidu(a4r4) / 56M |
Spconv2.1 | VirConv-S | ~13 GB | 95.76 | 90.91 | 88.61 | google / baidu(j3mi) / 56M |
conda create -n spconv2 python=3.9
conda activate spconv2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
pip install numpy==1.19.5 protobuf==3.19.4 scikit-image==0.19.2 waymo-open-dataset-tf-2-5-0 nuscenes-devkit==1.0.5 spconv-cu111 numba scipy pyyaml easydict fire tqdm shapely matplotlib opencv-python addict pyquaternion awscli open3d pandas future pybind11 tensorboardX tensorboard Cython prefetch-generator
Our released implementation is tested on.
We also tested on.
We also tested on.
You must creat additional semi
dataset and velodyne_depth
dataset to run our multimodal and semi-supervised detectors.
You can download all the preprocessed data from
baidu (japc) [74GB],
or partial data (not include semi
due to disk space limit )
from google (13GB).
Or you can generate the dataset by yourself as follows:
Please download the official KITTI 3D object detection dataset, KITTI odometry dataset and organize the downloaded files as follows (the road planes could be downloaded from [road plane], which are optional for data augmentation in the training):
VirConv
├── data
│ ├── odometry
│ │ │── 00
│ │ │── 01
│ │ │ │── image_2
│ │ │ │── velodyne
│ │ │ │── calib.txt
│ │ │── ...
│ │ │── 21
│ ├── kitti
│ │ │── ImageSets
│ │ │── training
│ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes)
│ │ │── testing
│ │ │ ├──calib & velodyne & image_2
├── pcdet
├── tools
(1) Creat semi
dataset from odometry dataset.
cd tools
python3 creat_semi_dataset.py ../data/odometry ../data/kitti/semi
(2) Download the pseudo labels generated by VirConv-T from here (fuse detections from last 10 checkpoints by WBF and filter low quality detections by a 0.9 score threshold) and put it into kitti/semi
.
(3) Download the PENet depth completion model from google (500M) or baidu (gp68), and put it into tools/PENet
.
(4) Then run the following code to generate RGB virtual points.
cd tools/PENet
python3 main.py --detpath ../../data/kitti/training
python3 main.py --detpath ../../data/kitti/testing
python3 main.py --detpath ../../data/kitti/semi
(5) After that, run following command to creat dataset infos:
python3 -m pcdet.datasets.kitti.kitti_dataset_mm create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
python3 -m pcdet.datasets.kitti.kitti_datasetsemi create_kitti_infos tools/cfgs/dataset_configs/kitti_dataset.yaml
Anyway, the data structure should be:
VirConv
├── data
│ ├── kitti
│ │ │── ImageSets
│ │ │── training
│ │ │ ├──calib & velodyne & label_2 & image_2 & (optional: planes) & velodyne_depth
│ │ │── testing
│ │ │ ├──calib & velodyne & image_2 & velodyne_depth
│ │ │── semi (optional)
│ │ │ ├──calib & velodyne & label_2(pseudo label) & image_2 & velodyne_depth
│ │ │── gt_database_mm
│ │ │── gt_databasesemi
│ │ │── kitti_dbinfos_trainsemi.pkl
│ │ │── kitti_dbinfos_train_mm.pkl
│ │ │── kitti_infos_test.pkl
│ │ │── kitti_infos_train.pkl
│ │ │── kitti_infos_trainsemi.pkl
│ │ │── kitti_infos_trainval.pkl
│ │ │── kitti_infos_val.pkl
├── pcdet
├── tools
cd VirConv
python setup.py develop
For training the VirConv-L and VirConv-T:
Single GPU train:
cd tools
python3 train.py --cfg_file ${CONFIG_FILE}
For example, if you train the VirConv-L model:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-L.yaml
Multiple GPU train:
You can modify the gpu number in the dist_train.sh and run
cd tools
sh dist_train.sh
The log infos are saved into log.txt
You can run cat log.txt
to view the training process.
For training the VirConv-S:
You should firstly train a VirConv-T:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-T.yaml
Then train the VirConv-S:
cd tools
python3 train.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --pretrained_model ../output/models/kitti/VirConv-T/default/ckpt/checkpoint_epoch_40.pth
cd tools
python3 test.py --cfg_file ${CONFIG_FILE} --batch_size ${BATCH_SIZE} --ckpt ${CKPT}
For example, if you test the VirConv-S model:
cd tools
python3 test.py --cfg_file cfgs/models/kitti/VirConv-S.yaml --ckpt VirConv-S.pth
Multiple GPU test: you should modify the gpu number in the dist_test.sh and run
sh dist_test.sh
The log infos are saved into log-test.txt
You can run cat log-test.txt
to view the test results.
This code is released under the Apache 2.0 license.
@inproceedings{VirConv,
title={Virtual Sparse Convolution for Multimodal 3D Object Detection},
author={Wu, Hai and Wen,Chenglu and Shi, Shaoshuai and Wang, Cheng},
booktitle={CVPR},
year={2023}
}