Closed huazhenliu closed 2 years ago
Hi, You can set "samples_per_gpu=4, workers_per_gpu=4",so the batchsize is equal to the default setting. Then you don't need use the "--autoscale-lr". When you use a different batchsize, we suggest manually modifying the learning rate instead of using the "--autoscale-lr". For example, you can set "samples_per_gpu=8, workers_per_gpu=4", and "lr=4e-4".
Thanks for your quick reply! So the main reason for the performance drop is "lr and batchsize"?
Yes,I think is "lr and batchsize".
ok,I will try more experiments. Thanks again.
Thanks for sharing such wonderful & interesting work!!! I'm trying to reproduce the result of "petr_r50dcn_gridmask_p4.py". At the end of this config file, the result is as followed:
mAP: 0.3174
mATE: 0.8397
mASE: 0.2796
mAOE: 0.6158
mAVE: 0.9543
mAAE: 0.2326
NDS: 0.3665
I train with this config file, because I have only 2 V100 cards, I change the batchsize as "samples_per_gpu=2, workers_per_gpu=2,", also use "--autoscale-lr" and not to use it. But my result is almost like this: mAP: 0.2103 mATE: 1.0048 mASE: 0.3099 mAOE: 0.8165 mAVE: 1.1984 mAAE: 0.4087 NDS: 0.2516
I also check the training log you provided (20220606_223059.log), at the end of 24 epochs, your loss is 5.6355, but for my loss, it's about 7.xx. I test the model you provided, result is the same as that in "petr_r50dcn_gridmask_p4.py".
lr, batchsize, or other parameters? Any advices? Thanks!