mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.26k stars 409 forks source link

Clarification on Training of 'swint-nuimages-pretrained.pth' #625

Closed Ruvennsiow closed 2 months ago

Ruvennsiow commented 3 months ago

I am currently working on training the C+L BEVFusion model and have encountered some confusion regarding the checkpoints being used during the process.

It appears that the training procedure involves using a combination of a lidar-only model and a pretrained camera model. Specifically, the checkpoints utilized are:

  1. Lidar-only model ( lidar-only-det.pth)
  2. Pretrained camera model (swint-nuimages-pretrained.pth)

However, I noticed that the combination does not involve the camera-only model along with the lidar-only model, which seems to be a logical choice for such fusion models.

Could you please provide detailed information on how the swint-nuimages-pretrained.pth is being trained? Understanding the training methodology behind this pretrained camera model will greatly help in comprehending its integration within the C+L BEVFusion model.

Thanks!

zhijian-liu commented 2 months ago

Thank you for your interest in our project. This repository is no longer actively maintained, so we will be closing this issue. Please refer to the amazing implementation at MMDetection3D. Thank you again!