Closed jiapeng789 closed 1 year ago
I haven't tried training with fewer than 8 GPUs, but you can have a try reducing the learning rate. By the way, gradient norm = nan at the starting phase is normal, you can wait for several thousands of iterations and see whether the gradient norm can be stable. This is because we used a gradient scaler that can adaptively change the scaling ratio.
Thank you for your reply! I set the initial learning rate of the optimizer in the lidar-only object detection model to 1/2 of the original (from 1.0e-4 to 5.0e-5), epochs=20, and other things remain the same, and there is no gradient explosion during training. Now that I have trained for 12 epochs, the evaluation results of the model on the validation set tend to be stable, mAP≈0.5042, NDS≈0.5962, which is still far from the results published in the paper (mAP=64.68). I don't know if this result is caused by the small learning rate setting. Do you have any suggestions for model training?
Actually my schedule is almost the same as the official TransFusion paper (probably the difference is that they manually restart at epoch 15). And the schedule itself is widely used by almost all 3D object detection papers. It would be a bit hard for me to provide an alternative schedule quickly that can work equally well with a small number of GPUs. But what I can say is that the GT paste fading strategy (gt_paste_stop_epoch: 15
in our configuration) is going to bring about ~5mAP improvement over the baseline. So some of the gap can be explained when you used a shorter schedule. But this strategy does not matter that much for the fusion models.
Thank you for your reply. The training of the lidar-only object detection model was done with two GPUs, mAP=0.6344, NDS=0.6855, although there is a gap with your published results (mAP=0.6468, NDS=0.6828), I think it can be improved by adjusting the learning. Now, I am training the Lidar + Camera fusion object detection algorithm on 3090GPU, but encounter the following problems:
Looking forward to hearing from you, thank you!
Hi @jiajiaen,
Seems that the LiDAR-only results are much better now. I guess if you cannot close the gap, it might be related to the number of GPUs.
Best, Haotian
Hi @kentang-mit
When setting the image input resolution to [128, 352], the effect after fine-tuning is very poor, now, I reset the image input resolution to [256, 704], but accidentally found a puzzling problem, when using "swin_tiny_patch4_window7_224. pth" to initialize the camera backbone, the weight value in the weight file is the same as the weight value after the camera backbone is initialized. However, when using "--model.encoders.camera.backbone.init_cfg.checkpoint pretrained/swint-nuimages-pretrained.py" to initialize the camera backbone, the weights after model initialization are not the same as those in "swint-nuimages-pretrained.py" , which means that the pretrained weights did not successfully initialize the camera backbone model. I don't know the cause of this problem, have you encountered a similar problem.
Best, Jiapeng
That's interesting. Is it possible for you to share the warning returned by mmcv? Probably you also need to check whether your mmcv
version is the same as ours (1.4.0).
I checked the created conda environment, mmcv-full==1.4.0, the versions of other dependencies also meet the requirements. The camera backbone model did not report any warnings or errors during the process of loading pre-trained weights. The camera backbone can successfully load the weight information in the "swin_tiny_patch4_window7_224.pth" file but cannot load the weight information in the "swint-nuimages-pretrained.pth" file, when loading the pretrained weights in "swint-nuimages-pretrained.pth" , the camera backbone will be randomly initialized (instead of using the weights in "swint-nuimages-pretrained.pth"). In addition, I did the same verification in the docker environment, the camera backbone can successfully load the weight information in the "swin_tiny_patch4_window7_224.pth" file but cannot load the weight information in the "swint-nuimages-pretrained.pth" file. I'm not sure how to fix this, is there a possibility that the "swint-nuimages-pretrained.pth" weights file is wrong.
Best, Jiapeng
I will investigate it. Please stay tuned.
Hi @kentang-mit When training the fusion detection model, I tried a variety of methods a still couldn't use the pre-trained "swint-nuimages-pretrained.pth" to initialize the camera backbone. If "swin_tiny_patch4_window7_224.pth" is used as the pre-training weight of the camera backbone, how much difference will the final training accuracy be?
Best, Jiapeng
The accuracy difference will be very small. I expect that difference to be within 0.2% in mAP.
@kentang-mit could you clarify if you're scaling LR with batch size? Or are you using just LR = 1e-4 with AdamW for batch size of 32 (8GPUs * 4)
Thank you!
I did not try scaling LR with batch size in my experiments but I did try several starting lrs in my experiments. It seems that the results are relatively stable w.r.t the starting LR.
Closed due to inactivity.
Hello, looking at your previous reply, you used 8 GPUs to train the model. When I used two 3090gpus to train the model, the gradient exploded. I suspect that the learning rate was too large. When training the model with different numbers of GPUs, how should the learning rate and epochs be adjusted? For me, is it necessary to adjust the learning rate to 1/4 of the principle and adjust the epochs to 30 ? Looking forward to hearing from you.