Some question about pretrain swin Transformer.

yichen928 / SparseFusion

[ICCV 2023] SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Apache License 2.0

198 stars 19 forks source link

Some question about pretrain swin Transformer. #26

Closed liyih closed 9 months ago

liyih commented 10 months ago

Hi yicheng, Thanks for your awesome work. I want to know how do you pretrain the swin-T based camera branch. Could you give me some detail information, like what dataset it uses (imagenet or nuimage)? What more, I want to know the input scale of image during pretrain. If you could share me with your code of pretrain it would be better. Thanks! Best

yichen928 commented 9 months ago

Thanks for your interest.

the Swin-T backbone and FPN are further finetuned by us on nuImages from the COCO pretrained weights (https://download.openmmlab.com/mmdetection/v2.0/swin/mask_rcnn_swin-t-p4-w7_fpn_1x_coco/mask_rcnn_swin-t-p4-w7_fpn_1x_coco_20210902_120937-9d6b7cfa.pth). All the finetuning settings (input size, lr, epochs) are same as https://github.com/open-mmlab/mmdetection3d/blob/main/configs/nuimages/mask-rcnn_r50_fpn_1x_nuim.py, and you just need to change the backbone from R50 to Swin-T.

liyih commented 9 months ago

Thanks for your reply!

liyih commented 9 months ago

In Swin-T based Sparsefusion the input size of image is 448×800. But in mask-rcnn_r50_fpn_1x_nuim.py the size is not that, so do I need to use size 448×800?

yichen928 commented 9 months ago

That does not matter. You do not need to resize.