Closed luhavefun closed 2 years ago
We trained our model with 4 GPUs.
So the effective batch size is 64 (16 per GPU * 4 GPUs).
I think you trained model with single GPU and set learning rate as 1e-4.
If you'd like to train model with single GPU, please set learning rate as 1e-4*1/4.
Thank you for your reply! It is stable at present (20epochs) after changing the lr.
Thanks for your great work. I meet some troubles in training on HO3D-v2. I trained the model according to the given steps, but found that the model did not converge properly. Here is the logs from the last epoch: [92m08-09 04:21:59[0m Epoch 69/70 itr 4123/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 5.1809 loss_mano_joints: 5.4888 loss_mano_pose: 0.3954 loss_mano_shape: 0.2697 loss_joints_img: 3.1225 [92m08-09 04:22:00[0m Epoch 69/70 itr 4124/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 11.5330 loss_mano_joints: 12.5328 loss_mano_pose: 0.6882 loss_mano_shape: 0.3608 loss_joints_img: 4.0549 [92m08-09 04:22:00[0m Epoch 69/70 itr 4125/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 4.1236 loss_mano_joints: 4.4260 loss_mano_pose: 0.3778 loss_mano_shape: 0.5172 loss_joints_img: 3.0202 [92m08-09 04:22:00[0m Epoch 69/70 itr 4126/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 23.6163 loss_mano_joints: 25.9149 loss_mano_pose: 0.5916 loss_mano_shape: 0.2481 loss_joints_img: 3.1539 [92m08-09 04:22:00[0m Epoch 69/70 itr 4127/4128: lr: 1.17649e-05 speed: 0.19(0.19s r0.00)s/itr 0.22h/epoch loss_mano_verts: 10.1710 loss_mano_joints: 10.8513 loss_mano_pose: 0.5306 loss_mano_shape: 0.2882 loss_joints_img: 4.7335