Closed yinjunbo closed 2 years ago
As far as I understand, the final fusion model can be end-to-end trained by:
torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth
looking forward to your reply, thanks!
Hi @yinjunbo,
As I have mentioned in other issues, we take a two-stage training pipeline. The LiDAR-only model is trained first and then we load the weights and finetune the camera+LiDAR BEVFusion model. I believe the current code release has provided enough details for researchers in this field to easily reproduce our camera+LiDAR BEVFusion results.
Best, Haotian
Hi @yinjunbo,
As I have mentioned in other issues, we take a two-stage training pipeline. The LiDAR-only model is trained first and then we load the weights and finetune the camera+LiDAR BEVFusion model. I believe the current code release has provided enough details for researchers in this field to easily reproduce our camera+LiDAR BEVFusion results.
Best, Haotian
What I understand is that for the object detection model of camera+lidar fusion, first train the object detection model of lidar-only, and then fine-tune the training weight on the camera-lidar BEVFusion model. For the semantic segmentation model of camera+lidar fusion, the camera-only semantic segmentation model is trained first, and then the training weight is fine-tuned on the camera-lidar BEVfusion model. Is that so? thank you.
Hi @yinjunbo,
As I have mentioned in other issues, we take a two-stage training pipeline. The LiDAR-only model is trained first and then we load the weights and finetune the camera+LiDAR BEVFusion model. I believe the current code release has provided enough details for researchers in this field to easily reproduce our camera+LiDAR BEVFusion results.
Best, Haotian
Thanks for your quick reply. When finetuning the Camera+LiDAR BEVFusion model, do we need to pre-train the Camera-only model for detecton first, or directly using the nuimage pretrained model is ok for the fusion model?
Hi @yinjunbo,
No, you do not need to pretrain on the camera-only task. At least my previous experiments show that using nuImages-pretrained model that we released give a better performance.
Best, Haotian
Hi @yinjunbo,
No, you do not need to pretrain on the camera-only task. At least my previous experiments show that using nuImages-pretrained model that we released give a better performance.
Best, Haotian
Following your advice, I train the fusion model intinizated from your pre-trained lidar-only and camera-only models and get following results, which is 2 points lower than the readme(68.85mAP and 71.38NDS). Is there something I missed? Looking forword to your reply. @kentang-mit
mAP: 0.6628
mATE: 0.2782
mASE: 0.2567
mAOE: 0.2984
mAVE: 0.2367
mAAE: 0.1877
NDS: 0.7056
Hi @yinjunbo,
There are several things you can try.
First, try out using tools/test.py
to evaluate the model after training instead of directly reading out the results from the training log. This will give you better (and actually correct) numbers. I'm currently not sure why the numbers during training and the results given by tools/test.py
are different after I refactored the code recently.
Second, you can tune your learning rate schedule and the data augmentations. I think we have released enough details about it. Please make sure you do not turn on GT augmentation during BEVFusion (C+L) training, that will hurt the performance because you did not synchronize augmentations for LiDAR and camera.
Third, I'm not sure whether I understood your comment correctly, but I suggest you not to initialize from the pretrained camera-only 3D detection model but instead from the pretrained camera-only 2D detection model on nuImages.
Best, Haotian
Hey @kentang-mit , can you clarify which augmentation are you referring to, to turn off during fusion training?
Hi @yinjunbo,
There are several things you can try.
First, try out using
tools/test.py
to evaluate the model after training instead of directly reading out the results from the training log. This will give you better (and actually correct) numbers. I'm currently not sure why the numbers during training and the results given bytools/test.py
are different after I refactored the code recently.Second, you can tune your learning rate schedule and the data augmentations. I think we have released enough details about it. Please make sure you do not turn on GT augmentation during BEVFusion (C+L) training, that will hurt the performance because you did not synchronize augmentations for LiDAR and camera.
Third, I'm not sure whether I understood your comment correctly, but I suggest you not to initialize from the pretrained camera-only 3D detection model but instead from the pretrained camera-only 2D detection model on nuImages.
Best, Haotian
Hi, @kentang-mit , thanks for your kind reply.
tools/test.py
, and it really improves the results (less than 0.5 points).gt_paste_stop_epoch
is set as -1. Btw, I notice the default learning rate 1e-4
is set according to 8 GPU and batch size 4. Do we need to further tune it? And is there any other augmentation strategy need to be turn off besides GT-AUG?Could you please share your training log (both lidar-only and fusion model), so I can see the reason of my lower performance?
Hi @yinjunbo,
Learning rate and finetune schedule should be tuned, and GT augmentation is not used during the training process of camera+LiDAR fusion models as I stated in other issues. For the training log, I'm afraid I cannot share it publicly at the current timestamp, I would appreciate it if you can understand that I do have some concerns on it right now. However, I will make the training configurations public in the end of November.
Best, Haotian
Hi @yinjunbo,
Learning rate and finetune schedule should be tuned, and GT augmentation is not used during the training process of camera+LiDAR fusion models as I stated in other issues. For the training log, I'm afraid I cannot share it publicly at the current timestamp, I would appreciate it if you can understand that I do have some concerns on it right now. However, I will make the training configurations public in the end of November.
Best, Haotian
Got it. Thanks for your time!
No problem. I'm closing it temporarily. Feel free to reopen if you have further questions.
Thanks for your nice work. But which config or command should be used to train the fusion model(
configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml
)?