[AAAI 2024] M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking

The official code release of M3SOT.

M3SOT

3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple receptive fields (continuous contexts), and multiple solution spaces (distinct tasks) in ONE model. Remarkably, M3SOT pioneers in modeling temporality, contexts, and tasks directly from point clouds, revisiting a perspective on the key factors influencing SOT. To this end, we design a transformer-based network centered on point cloud targets in the search area, aggregating diverse contextual representations and propagating target cues by employing historical frames. As M3SOT spans varied processing perspectives, we've streamlined the network—trimming its depth and optimizing its structure—to ensure a lightweight and efficient deployment for SOT applications. We posit that, backed by practical construction, M3SOT sidesteps the need for complex frameworks and auxiliary components to deliver sterling results. Extensive experiments on benchmarks such as KITTI, nuScenes, and Waymo Open Dataset demonstrate that M3SOT achieves state-of-the-art performance at 38 FPS.

Setup

Following CXTrack, we list the most important part of our dependencies

Dependency	Version
open3d	0.15.2
python	3.8.0
pytorch	1.8.0
pytorch-lightning	1.5.10
pytorch3d	0.6.2
shapely	1.8.1
torchvision	0.9.0

Others can be seen in Open3DSOT.

KITTI

Download the data for velodyne, calib and label_02 from KITTI Tracking.
Unzip the downloaded files.

Put the unzipped files under the same folder as following.

[Parent Folder]
--> [calib]
  --> {0000-0020}.txt
--> [label_02]
  --> {0000-0020}.txt
--> [velodyne]
  --> [0000-0020] folders with velodynes .bin files

Waymo Open Dataset

We follow the benchmark created by LiDAR-SOT based on the waymo open dataset. You can download and process the waymo dataset as guided by LiDAR_SOT, and use our code to test model performance on this benchmark.

The following processing results are necessary.

[waymo_sot]
[benchmark]
    [validation]
        [vehicle]
            bench_list.json
            easy.json
            medium.json
            hard.json
        [pedestrian]
            bench_list.json
            easy.json
            medium.json
            hard.json
[pc]
    [raw_pc]
        Here are some segment.npz files containing raw point cloud data
[gt_info]
    Here are some segment.npz files containing tracklet and bbox data

NuScenes

We follow V2B to prepare the nuscenes dataset (v1.0), and we also cite the following sentences from STNet.

Since both kitti and waymo are datasets constructed from 64-line LiDAR, nuScenes is a 32-line LiDAR. We recommend you: train your model on KITTI and verify the generalization ability of your model on waymo. Train on nuScenes or simply skip this dataset. We do not recommend that you verify the generalization ability of your model on nuScenes.

Tips: If you have the wrong version of NuScenes dependencies, you will most likely not reproduce our results.

Get Started

Training

To train a model, you must specify the .yaml file. The .yaml file contains all the configurations of the dataset and the model. We provide .yaml files under the configs directory.

python main.py configs/3dtrack_kitti_car_cfg_multi_input2_perception_space.yaml --gpus 0 1

Testing

To test a trained model, specify the checkpoint with --resume_from argument and set the --phase argument as test.

python main.py configs/3dtrack_kitti_car_cfg_multi_input2_perception_space.yaml --phase test --resume_from pretrained/m3sot_kitti_car_test_multi_input2_perception_space/checkpoints/best_epoch_precesion=87.4_success=75.9.ckpt

Reproduction

Please download the trained models from Google Cloud Drive to the project.

Model	Category	Success	Precision	Checkpoint
M3SOT-KITTI	Car	75.9	87.4	path
M3SOT-KITTI	Pedestrian	66.6	92.5	path
M3SOT-KITTI	Van	59.4	74.7	path
M3SOT-KITTI	Cyclist	70.3	93.4	path

To reproduce the results, simply run the code with the corresponding .yaml file and checkpoint.

The reported results of M3SOT checkpoints are produced on 3090 GPUs. Due to the precision issues, there could be minor differences if you test them with other GPUs.

Visualization

You can get the M3SOT's tracking visualization results based on the .json files under the results directory.

python visualize.py --result_dir results/kitti_car_m3sot_test/m3sot_result.json

Citation

If you entrust our work with value, please consider giving a star ⭐ and citation.

@inproceedings{liu2024m3sot,
  title={M3SOT: Multi-Frame, Multi-Field, Multi-Space 3D Single Object Tracking},
  author={Liu, Jiaming and Wu, Yue and Gong, Maoguo and Miao, Qiguang and Ma, Wenping and Xu, Cai and Qin, Can},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={4},
  pages={3630--3638},
  year={2024}
}

Acknowledgement

This repo is heavily built upon Open3DSOT, STNet and CXTrack.
Thank Erik Wijmans for his pytorch implementation of PointNet++.

liujia99 / M3SOT

readme