open-mmlab / mmhuman3d

OpenMMLab 3D Human Parametric Model Toolbox and Benchmark
https://mmhuman3d.readthedocs.io/
Apache License 2.0
1.25k stars 137 forks source link

Getting Exponentially large losses for CLIFF after second epoch. #349

Open ShreelekhaR opened 1 year ago

ShreelekhaR commented 1 year ago

I attempted to follow the convert datasets code for CLIFF, but there are some issues when I try to train using those datasets, I get exponentially large MPJPE, and PA-MPJPE losses. It would be really helpful!

Example on 9th epoch: 2023-05-15 11:38:16,965 - mmhuman3d - INFO - Epoch [9][2000/4878] lr_backbone: 3.000e-04 lr_head: 3.000e-04, eta: 0:00:00, time: 1.417, data_time: 0.030, memory: 3198, keypoints3d_loss: 1079724.9350, keypoints2d_loss: 130367.2165, vertex_loss: 18867.5181, smpl_pose_loss: 2.0817, smpl_betas_loss: 9921199272.9600, loss: 9922428395.5200

qinb commented 1 year ago

@ShreelekhaR I have the same question. and do you solve this problem?

ShamLich commented 1 year ago

Same question,how do you solve this problem? I try to train cliff in mmhuman3d or rebuild on spin,they all have large losses finally。