huazhenliu commented 2 years ago

Thanks for sharing such wonderful & interesting work!!! I'm trying to reproduce the result of "petr_r50dcn_gridmask_p4.py". At the end of this config file, the result is as followed:

mAP: 0.3174

mATE: 0.8397

mASE: 0.2796

mAOE: 0.6158

mAVE: 0.9543

mAAE: 0.2326

NDS: 0.3665

I train with this config file, because I have only 2 V100 cards, I change the batchsize as "samples_per_gpu=2, workers_per_gpu=2,", also use "--autoscale-lr" and not to use it. But my result is almost like this: mAP: 0.2103 mATE: 1.0048 mASE: 0.3099 mAOE: 0.8165 mAVE: 1.1984 mAAE: 0.4087 NDS: 0.2516

I also check the training log you provided (20220606_223059.log), at the end of 24 epochs, your loss is 5.6355, but for my loss, it's about 7.xx. I test the model you provided, result is the same as that in "petr_r50dcn_gridmask_p4.py".

lr, batchsize, or other parameters? Any advices? Thanks!

yingfei1016 commented 2 years ago

Hi， You can set "samples_per_gpu=4, workers_per_gpu=4"，so the batchsize is equal to the default setting. Then you don't need use the "--autoscale-lr". When you use a different batchsize, we suggest manually modifying the learning rate instead of using the "--autoscale-lr". For example, you can set "samples_per_gpu=8, workers_per_gpu=4", and "lr=4e-4".

huazhenliu commented 2 years ago

Thanks for your quick reply! So the main reason for the performance drop is "lr and batchsize"?

yingfei1016 commented 2 years ago

Yes，I think is "lr and batchsize".

huazhenliu commented 2 years ago

ok，I will try more experiments. Thanks again.

megvii-research / PETR

Reproduce PETR result #38

mAP: 0.3174

mATE: 0.8397

mASE: 0.2796

mAOE: 0.6158

mAVE: 0.9543

mAAE: 0.2326

NDS: 0.3665