mit-han-lab / bevfusion

[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
https://bevfusion.mit.edu
Apache License 2.0
2.26k stars 409 forks source link

Error occurrence in evaluating pretrained models,about the bevfusion-det.pth. #596

Closed fdy61 closed 4 months ago

fdy61 commented 7 months ago

when I run the command of "torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth",when loading lidar-only-det.pth,it will say "The model and loaded state dict do not match exactly" image and there will be a RuntimeError: Given groups=1, weight of size [8, 1, 1, 1], expected input[24, 6, 256, 704] to have 1 channels, but got 6 channels instead, image How can I solve it?

fdy61 commented 7 months ago

I use the command given by the author in the github, image

GZF123 commented 7 months ago

请问您解决了吗?办法是什么呢?

yuanjiechen commented 6 months ago

请问您解决了吗?办法是什么呢?

我使用了bevfusion-det.pth没有问题

fdy61 commented 6 months ago

No problem.I had Solved it.

wyy032 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

fdy61 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

wyy032 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

fdy61 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

wyy032 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

fdy61 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

wyy032 commented 5 months ago

Have you been successful in your recovery?I am reproducing the code, but my training accuracy is far from reaching. How should I handle this? Can you give me some suggestions?The training is for bevfusion det(L+C), which can output two modalities through visualization, but the training accuracy is too low(NDS=0.4665). I earnestly request your reply and help!The system is ubuntu20.04, and the GPU is a single a100. Although there are four a100, parallel training is not feasible and can only temporarily use a single GPU

It seems that you should train the lidar model first to get lidar-only-det.pth, and then use the lidar-only-det.pth and swint-nuimages-pretrained.pth to train the final model

Thank you for the thoughts you have provided! I will try to use this method. May I ask if your reproduction is consistent with the results of the paper?

Yes, but I just do the detection, not segmentation. For BEVFusion detection model, you should train it with your trained lidar model (or you can just use the pretrained/swint-nuimages-pretrained.pth the author has given) and the image backbone, pretrained/swint-nuimages-pretrained.pth. You can see the training order in README.md: torchpack dist-run -np 8 python tools/train.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml --model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.pth --load_from pretrained/lidar-only-det.pth no matter you train it in single GPU or in mutiple GPUs, it dosen't affect the final results.

Weixin Image_20240426173549 I am running this instruction as it is, no changes in the code, read the terminal output carefully and realized that there is a lot of missing information, do you get this error during training and what can I do to fix it?

This is just a warning, not an error. Because the entire BEVFusion detection model has the camera branch, but lidar-only-det.pth dosen't. The camera branch is loaded from swint-nuimages-pretrained.pth. So you just run the order and it's OK. You will find out that the model will train successfully.

Thank you for your patience. But I was able to train it successfully, the main problem is still the same as the one at the beginning, the accuracy is not high and very different from the results of the paper. nuScenes-full ran three rounds with a single card A100 first and the NDS was always at 0.46, the original paper NDS=0.7288, which is too much difference. About that I have not been able to solve.

fdy61 commented 5 months ago

3 epochs?That is far from enough. You can learn more details from the training strategy

fdy61 commented 5 months ago

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

wyy032 commented 5 months ago

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

fdy61 commented 5 months ago

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

wyy032 commented 5 months ago

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

wyy032 commented 5 months ago

@wyy032 The author seemed to train about 6-7 rounds. I dont remember.

I haven't run the full 6 rounds yet, as this will probably take about three days, but the first three rounds I ran the accuracy was consistently at 0.46, and I'm not sure if that's normal? I'm worried it won't go up in accuracy behind it.

Training for only three epochs suggests that the model has not yet fitted the data well. And I'm not sure whether you can resume from the epoch_3.pth and continue training it until 6, or you will restart from 1 to 6. I suggest that you obey the Author's training strategy.

Ok, thank you very much for your advice, I'll try again.

Hi, I've run six complete rounds in the last few days, but the accuracy still doesn't go up, I suspect it's a cuda version issue and no multi-card training. Can I ask which cuda version you are using?

fdy61 commented 5 months ago

11.3 or 11.1 @wyy032

wyy032 commented 5 months ago

11.3 or 11.1 @wyy032

ok,thankyou,I'll try again.

Ange1ika commented 4 months ago

Ничего. Я решил ее.

Как вы решили такую ошибку? У меня такая же

zhujiagang commented 3 months ago

No problem.I had Solved it.

How do you solve it? Wait for your reply. Thank you. @fdy61

zyqww commented 2 months ago

@fdy61 Hi, when you test the fusion model using the official commands can you achieve thesis accuracy? torchpack dist-run -np 1 python tools/test.py configs/nuscenes/det/transfusion/secfpn/camera+lidar/swint_v0p075/convfuser.yaml pretrained /bevfusion-det.pth --eval bbox When I test the laser-only and camera-only models, I get mAP=0.6468, NDS=0.6924 and mAP=0.3554, NDS=0.4121, which is very similar to the paper results, but when I test the fusion model the results are only mAP=0.6728, NDS=0.7069, what could be the reason?