mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.26k stars 409 forks source link

About the init_cfg.checkpoint of the backbone #176

Closed sunnyHelen closed 1 year ago

sunnyHelen commented 1 year ago

Hi, thanks a lot for sharing the great work. I notice a good pre-trained model for Camere backbone is helpful to get a better effect. How did you get the pre-trained model "swint-nuimages-pretrained.pth"? And what the data volume? Thanks a lot~

kentang-mit commented 1 year ago

The Swin-T model is pretrained on nuImages 2D object detection task following the convention of almost all camera-LiDAR fusion papers (that uses nuScenes). You can refer to official mmdet3d repo for details of model training on nuImages.

sunnyHelen commented 1 year ago

Ok, got it. Many thanks.

sunnyHelen commented 1 year ago

Hi, I checked the official mmdet3d repo. It seems that there's no training configs of the training of swinT. Can you release the pretraining configs of it? Or how can I find the proper way to train a swinT model on nuImages 2D object detection task?

kentang-mit commented 1 year ago

Sorry I don't have the bandwidth to check whether my previous experiment config can work with the latest codebase, so the release plan will be delayed. However, I think it should be quite easy to get started from the mmdetection3d codebase. I remember that they provided some example for ResNet101-based detectors. You may just change the image backbone to SwinT.

sunnyHelen commented 1 year ago

Yes, they provide different settings. And I'm confused about whether you pretrained on COCO and which model you used for the 2D detection task. Could you give me a suggestion to find the same setting as yours~

image
kentang-mit commented 1 year ago

I remember that I did pretraining on COCO, and the pretrained checkpoint can be found under the official Swin-T repo. For pretraining on nuImages I used CascadeRCNN. However, I don't think the impact of pretraining on nuImages is that large for fusion models (the impact is large for camera-only models though), so perhaps if you want to choose other faster detectors for 2D pretraining it should also be OK.

sunnyHelen commented 1 year ago

That's helpful. Thanks a lot.