xingyizhou / pytorch-pose-hg-3d

PyTorch implementation for 3D human pose estimation
GNU General Public License v3.0
615 stars 141 forks source link

Terrible result during training #49

Closed Colinsnow1 closed 5 years ago

Colinsnow1 commented 5 years ago

Hi Xingyi, When I was training in stage2&3, I found that the accuracy and MPJPE is so terrible. I noticed accuracy drop from 0.83 to 0.02 in the first epoch of stage1! Is that possible reason for such case?

Here is the log. pytorch-gpu version=0.3.1 image

xingyizhou commented 5 years ago

Hi, Thanks for reporting. This is a known issue of the pytorch cudnn BN implementation https://github.com/xingyizhou/pytorch-pose-hg-3d/issues/16 . If your pytorch version is greater than 0.1.12, you will need to disable cudnn BN by following the instruction here.

Colinsnow1 commented 5 years ago

Hi Xingyi, Thanks for reply. Actually, I already noticed the known issue before and I also set torch.backends.cudnn.enabled = False to disable cudnn BN, but it didn't work. Moreover, the log I submit seem to be unnormal, may you release part of the your training log for me to debug?Thanks again!

xingyizhou commented 5 years ago

Hi, I don't have the log with me on my current machine. As I remembered, training MPJPE goes down very fast, and validation goes down slower but drops a lot after decreasing learning rate. ACC should be always > 0.9. I will suggest switching to pytorch 0.1.12 for a safe option to reproduce the result.

Colinsnow1 commented 5 years ago

Hi Xingyi, It worked after I downgraded pytorch to 0.1.12 version and changed Upsample module to UpsamplingBilinear2d. Thanks for help.