I am currently working on training the C+L BEVFusion model and have encountered some confusion regarding the checkpoints being used during the process.
It appears that the training procedure involves using a combination of a lidar-only model and a pretrained camera model. Specifically, the checkpoints utilized are:
Lidar-only model ( lidar-only-det.pth)
Pretrained camera model (swint-nuimages-pretrained.pth)
However, I noticed that the combination does not involve the camera-only model along with the lidar-only model, which seems to be a logical choice for such fusion models.
Could you please provide detailed information on how the swint-nuimages-pretrained.pth is being trained? Understanding the training methodology behind this pretrained camera model will greatly help in comprehending its integration within the C+L BEVFusion model.
Thank you for your interest in our project. This repository is no longer actively maintained, so we will be closing this issue. Please refer to the amazing implementation at MMDetection3D. Thank you again!
I am currently working on training the C+L BEVFusion model and have encountered some confusion regarding the checkpoints being used during the process.
It appears that the training procedure involves using a combination of a lidar-only model and a pretrained camera model. Specifically, the checkpoints utilized are:
However, I noticed that the combination does not involve the camera-only model along with the lidar-only model, which seems to be a logical choice for such fusion models.
Could you please provide detailed information on how the swint-nuimages-pretrained.pth is being trained? Understanding the training methodology behind this pretrained camera model will greatly help in comprehending its integration within the C+L BEVFusion model.
Thanks!