Closed LHY-HongyangLi closed 1 year ago
Hi, Thanks for your interest. Both checkpoints are merged by the pretrained weights of LiDAR and camera parts separately.
For both checkpoints, the LiDAR part is a pretrained LiDAR-only detector TransFusion-L, provided by the authors of DeepInteraction (https://drive.google.com/file/d/1IaLMcRu4SYTqcD6K1HF5UjfnRICB_IQM/view?usp=sharing). The backbone, FPN, and head are all incorporated into our initial weights.
For "sparsefusion_voxel0075_R50_initial", the ResNet-50 backbone and FPN is from Mask-RCNN pretrained on nuImage (https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/mask_rcnn_r50_fpn_coco-2x_1x_nuim/mask_rcnn_r50_fpn_coco-2x_1x_nuim_20201008_195238-b1742a60.pth). For "sparsefusion_voxel0075_SwinT_initial", the Swin-T backbone and FPN are further finetuned by us on nuImages from the COCO pretrained weights (https://download.openmmlab.com/mmdetection/v2.0/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco/mask_rcnn_swin-t-p4-w7_fpn_1x_coco_20210902_120937-9d6b7cfa.pth).
You may need to change some parameter names when merging the LiDAR and camera parts to get our initial weights.
Hi @yichen928 , SparseFusion is really a nice work, but I wonder how did you obtain the initial weights: sparsefusion_voxel0075_SwinT_initial and sparsefusion_voxel0075_R50_initial?