Closed jiapeng789 closed 1 year ago
We use 256x704 inputs, and I think that should be the reason why you cannot reproduce our results, but mIoU=0.312 still looks too low to me. There must be other problems in your setup.
I would suggest you to try reproducing our results using the exact configurations released by us (which has been tested by us).
OK, thank you for your reply, I will try again.
Hi, I run: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/seg/camera-bev256d2.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth to train the camera-only BEV segmentation model (image input resolution is set to: 128*352), and use the training weights to evaluate the model on the test set, the evaluation result miou=0.312. This is quite different from the evaluation results of the pretrained weights (camera-only-seg.pth) in code-master. I don't know what is wrong with my training setup, or if the model needs to be trained in two stages, I hope you can provide some training details and look forward to hearing from you.