Validation performance gap between 'evaluation when training' and 'evaluation after training'

henniekim commented 3 years ago

I have trained MonoFlex, but I got weird result as below.

[The validation performance when training] @ iter46400 (final model) – bbox AP:98.2362, 90.4009, 82.8753 bev AP:30.7541, 22.5483, 19.4824 3d AP:22.2204, 16.2115, 14.2746 aos AP:98.07, 89.99, 82.10

[The validation performance when evaluating separately (After training had been completed)] Car AP@0.70, 0.70, 0.70: bbox AP:97.7390, 90.2411, 82.6754 bev AP:27.6806, 20.9849, 18.2696 3d AP:19.7306, 14.9945, 13.3213 aos AP:97.58, 89.87, 81.93

Though two results should show the same result, It is observed that the obvious performance gap between two results. Is there any difference between 'validation when training' and 'validation after training' ?

Thanks.

zhangyp15 commented 3 years ago

We also observed similar performance gaps, though the large gap you produced was rarely observed. I think the problem can be mitigated with a larger weight-decay to reduce the severe over-fitting when the learning rate is decayed. Currently, we directly use the best model with evaluation.

Also, I think the uncertainties can actually aggravate the over-fitting since the model can learn the difficulty of each sample for smaller losses.

953038395 commented 3 years ago

I have trained MonoFlex, but I got weird result as below.

[The validation performance when training] @ iter46400 (final model) – bbox AP:98.2362, 90.4009, 82.8753 bev AP:30.7541, 22.5483, 19.4824 3d AP:22.2204, 16.2115, 14.2746 aos AP:98.07, 89.99, 82.10

[The validation performance when evaluating separately (After training had been completed)] Car AP@0.70, 0.70, 0.70: bbox AP:97.7390, 90.2411, 82.6754 bev AP:27.6806, 20.9849, 18.2696 3d AP:19.7306, 14.9945, 13.3213 aos AP:97.58, 89.87, 81.93

Though two results should show the same result, It is observed that the obvious performance gap between two results. Is there any difference between 'validation when training' and 'validation after training' ?

Thanks.

hello，do you use some trick in training？i use the same method as the github ，but the result is bad。

mrsempress commented 3 years ago

I have trained MonoFlex, but I got weird result as below. [The validation performance when training] @ iter46400 (final model) – bbox AP:98.2362, 90.4009, 82.8753 bev AP:30.7541, 22.5483, 19.4824 3d AP:22.2204, 16.2115, 14.2746 aos AP:98.07, 89.99, 82.10 [The validation performance when evaluating separately (After training had been completed)] Car AP@0.70, 0.70, 0.70: bbox AP:97.7390, 90.2411, 82.6754 bev AP:27.6806, 20.9849, 18.2696 3d AP:19.7306, 14.9945, 13.3213 aos AP:97.58, 89.87, 81.93 Though two results should show the same result, It is observed that the obvious performance gap between two results. Is there any difference between 'validation when training' and 'validation after training' ? Thanks.

hello，do you use some trick in training？i use the same method as the github ，but the result is bad。

I get bad results too. My validation performance(the last epoch): [but I don't have @henniekim problem]

Car AP@0.70, 0.70, 0.70:
bbox AP:89.5837, 86.1133, 79.8404
bev  AP:22.9169, 19.4140, 17.1499
**3d   AP:16.7855, 14.0667, 12.6753**  <-- in paper: 23.64 | 17.51 | 14.83
aos  AP:89.04, 85.48, 78.74

My validation performance(the best checkpoint(output/exp/model_moderate_best_soft.pth))

Car AP@0.70, 0.70, 0.70:
bbox AP:97.5570, 91.1835, 82.3261
bev  AP:27.5798, 21.1585, 18.2660
3d   AP:19.3165, 15.2950, 13.0708  <-- in paper: 23.64 | 17.51 | 14.83
aos  AP:97.11, 90.49, 81.31

zhangyp15 / MonoFlex

Validation performance gap between 'evaluation when training' and 'evaluation after training' #15