open-mmlab / mmskeleton

A OpenMMLAB toolbox for human pose estimation, skeleton-based action recognition, and action synthesis.
Apache License 2.0
2.92k stars 1.03k forks source link

loss is nan #423

Open shivanthika opened 3 years ago

shivanthika commented 3 years ago

Hi I am training model with 6 nodes (motion capture data). For every epoc loss function is nan and Top1 and Top5 accurracy are same. I am confuse what should be the issue

Train Data Shape: (3800, 4, 100, 6, 1)

[07.14.21|11:08:58] Training epoch: 0 [07.14.21|11:08:59] Iter 0 Done. | loss: nan | lr: 0.000100 [07.14.21|11:09:05] mean_loss: nan [07.14.21|11:09:05] Time consumption: [07.14.21|11:09:05] Done. [07.14.21|11:09:05] Training epoch: 1 [07.14.21|11:09:11] Iter 25 Done. | loss: nan | lr: 0.000100 [07.14.21|11:09:12] mean_loss: nan [07.14.21|11:09:12] Time consumption: [07.14.21|11:09:12] Done. [07.14.21|11:09:12] Training epoch: 2 [07.14.21|11:09:19] mean_loss: nan [07.14.21|11:09:19] Time consumption: [07.14.21|11:09:19] Done. [07.14.21|11:09:19] Training epoch: 3 [07.14.21|11:09:24] Iter 50 Done. | loss: nan | lr: 0.000100 [07.14.21|11:09:26] mean_loss: nan [07.14.21|11:09:26] Time consumption: [07.14.21|11:09:26] Done. [07.14.21|11:09:26] Training epoch: 4 [07.14.21|11:09:33] mean_loss: nan [07.14.21|11:09:33] Time consumption: [07.14.21|11:09:33] Done. [07.14.21|11:09:33] Eval epoch: 4 [07.14.21|11:09:34] mean_loss: nan [07.14.21|11:09:34] Top1: 8.75% [07.14.21|11:09:34] Top5: 29.57% [07.14.21|11:09:34] Done. [07.14.21|11:09:34] Training epoch: 5 [07.14.21|11:09:37] Iter 75 Done. | loss: nan | lr: 0.000100 [07.14.21|11:09:41] mean_loss: nan [07.14.21|11:09:41] Time consumption: [07.14.21|11:09:41] Done. [07.14.21|11:09:41] Training epoch: 6 [07.14.21|11:09:48] mean_loss: nan [07.14.21|11:09:48] Time consumption: [07.14.21|11:09:48] Done. [07.14.21|11:09:48] Training epoch: 7 [07.14.21|11:09:50] Iter 100 Done. | loss: nan | lr: 0.000100 [07.14.21|11:09:55] mean_loss: nan [07.14.21|11:09:55] Time consumption: [07.14.21|11:09:55] Done. [07.14.21|11:09:55] Training epoch: 8 [07.14.21|11:10:02] Iter 125 Done. | loss: nan | lr: 0.000100 [07.14.21|11:10:02] mean_loss: nan [07.14.21|11:10:02] Time consumption: [07.14.21|11:10:02] Done. [07.14.21|11:10:02] Training epoch: 9 [07.14.21|11:10:09] mean_loss: nan [07.14.21|11:10:09] Time consumption: [07.14.21|11:10:09] Done. [07.14.21|11:10:09] The model has been saved as /home/ST_GCN/st-gcn-sl/work/1//epoch10_model.pt. [07.14.21|11:10:09] Eval epoch: 9 [07.14.21|11:10:10] mean_loss: nan [07.14.21|11:10:10] Top1: 8.75% [07.14.21|11:10:10] Top5: 29.57% [07.14.21|11:10:10] Done.