Student model overfits really early training

yuanyuanli85 / Fast_Human_Pose_Estimation_Pytorch

Pytorch Code for CVPR2019 paper "Fast Human Pose Estimation" https://arxiv.org/abs/1811.05419

Apache License 2.0

325 stars 53 forks source link

Student model overfits really early training #15

Open yiwang454 opened 4 years ago

yiwang454 commented 4 years ago

When I trained the student model under supervision of the teacher model downloaded from the link in readme as well as the labelled data, the validation accuracy drops quite quickly on the 8th epoch, and the best validation accuracy was only about 60%, way low from the paper result. Why does this happen and how to solve this problem? If I train to more epochs will the result be better?

yuanyuanli85 commented 4 years ago

The default epoch to train on mpii is 90. The result in 8 epoch may not very stable. Did you follow the default parameter in this repo?

yiwang454 commented 4 years ago

Yes, I copy and paste the parameters from your repo. Here is the command I used. Pytorch$ python example/mpii_kd.py -a hg --stacks 2 --blocks 1 --checkpoint checkpoint/hg_s2_b1/ --mobile=True --teacher_stack 8 --teacher_checkpoint checkpoint/hg_s8_b1/model_best.pth.tar Also, I trained to 90 epoch now (which is your default setting), but the validation loss kept exploding, and the training accuracy converged at about 79%.

yiwang454 commented 4 years ago

Btw I stick to the learning rate 2.5*10^-4 (which is your default number and also the setting in the paper). Should I really do that or should I actually change learning rate during training?

yuanyuanli85 commented 4 years ago

Are you using pytorch 0.4x ? If so, did you disable the cudnn for batchnorm layer. It is a known issue in pytorch which will cause the instablility of training.