mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.26k stars 409 forks source link

Training problem #162

Closed jiapeng789 closed 1 year ago

jiapeng789 commented 1 year ago

Hi, I run: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth to train the camera-only BEV segmentation model (image input resolution is set to: 128*352), and use the training weights to evaluate the model on the test set, the evaluation result miou=0.312. This is quite different from the evaluation results of the pretrained weights (camera-only-seg.pth) in code-master. I don't know what is wrong with my training setup, or if the model needs to be trained in two stages, I hope you can provide some training details and look forward to hearing from you.

kentang-mit commented 1 year ago

We use 256x704 inputs, and I think that should be the reason why you cannot reproduce our results, but mIoU=0.312 still looks too low to me. There must be other problems in your setup.

I would suggest you to try reproducing our results using the exact configurations released by us (which has been tested by us).

jiapeng789 commented 1 year ago

OK, thank you for your reply, I will try again.