mkocabas / VIBE

Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
https://arxiv.org/abs/1912.05656
Other
2.88k stars 548 forks source link

[BUG] loss will suddenly increase very high in the train process #277

Open Oratacth opened 1 year ago

Oratacth commented 1 year ago

Does anyone else have this problem when training? like this : Epoch 2/50 |# | (25/500) | Total: 0:00:14 | ETA: 0:04:09 | loss: 2.4039 | loss_kp_2d: 1.47 | loss_kp_3d: 0.98 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.31 | data: Epoch 2/50 |# | (26/500) | Total: 0:00:14 | ETA: 0:04:08 | loss: 2.4214 | loss_kp_2d: 1.76 | loss_kp_3d: 0.76 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.31 | data: Epoch 2/50 |# | (27/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.4028 | loss_kp_2d: 0.75 | loss_kp_3d: 0.83 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.32 | data: Epoch 2/50 |# | (28/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.3830 | loss_kp_2d: 0.72 | loss_kp_3d: 0.79 | e_m_disc_loss: 0.04 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.30 | data: Epoch 2/50 |# | (29/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3815 | loss_kp_2d: 0.97 | loss_kp_3d: 1.05 | e_m_disc_loss: 0.05 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.25 | d_m_disc_loss: 0.28 | data: Epoch 2/50 |# | (30/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3664 | loss_kp_2d: 0.82 | loss_kp_3d: 0.66 | e_m_disc_loss: 0.08 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.20 | d_m_disc_loss: 0.23 | data: Epoch 2/50 |# | (31/500) | Total: 0:00:17 | ETA: 0:04:04 | loss: 16.4265 | loss_kp_2d: 433.47 | loss_kp_3d: 0.84 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | da Epoch 2/50 |## | (32/500) | Total: 0:00:17 | ETA: 0:03:58 | loss: 26.1461 | loss_kp_2d: 323.38 | loss_kp_3d: 1.10 | e_m_disc_loss: 0.49 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da Epoch 2/50 |## | (33/500) | Total: 0:00:18 | ETA: 0:03:57 | loss: 41.6130 | loss_kp_2d: 530.82 | loss_kp_3d: 1.06 | e_m_disc_loss: 0.71 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.10 | da Epoch 2/50 |## | (34/500) | Total: 0:00:18 | ETA: 0:03:56 | loss: 52.5822 | loss_kp_2d: 409.64 | loss_kp_3d: 1.20 | e_m_disc_loss: 0.78 | d_m_disc_real: 0.14 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.20 | da Epoch 2/50 |## | (35/500) | Total: 0:00:19 | ETA: 0:03:56 | loss: 64.3204 | loss_kp_2d: 457.43 | loss_kp_3d: 2.06 | e_m_disc_loss: 0.80 | d_m_disc_real: 0.13 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.19 | da Epoch 2/50 |## | (36/500) | Total: 0:00:19 | ETA: 0:03:55 | loss: 70.2876 | loss_kp_2d: 273.15 | loss_kp_3d: 3.72 | e_m_disc_loss: 0.64 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.14 | da Epoch 2/50 |## | (37/500) | Total: 0:00:20 | ETA: 0:03:55 | loss: 75.5481 | loss_kp_2d: 255.48 | loss_kp_3d: 7.11 | e_m_disc_loss: 0.92 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.10 | da Epoch 2/50 |## | (38/500) | Total: 0:00:20 | ETA: 0:03:54 | loss: 80.9875 | loss_kp_2d: 269.64 | loss_kp_3d: 10.41 | e_m_disc_loss: 0.77 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.02 | d_m_disc_loss: 0.06 | d Epoch 2/50 |## | (39/500) | Total: 0:00:21 | ETA: 0:03:51 | loss: 83.9370 | loss_kp_2d: 185.57 | loss_kp_3d: 9.21 | e_m_disc_loss: 0.45 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da Epoch 2/50 |## | (40/500) | Total: 0:00:21 | ETA: 0:03:50 | loss: 85.7856 | loss_kp_2d: 150.03 | loss_kp_3d: 7.09 | e_m_disc_loss: 0.20 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da Epoch 2/50 |## | (41/500) | Total: 0:00:22 | ETA: 0:03:55 | loss: 90.0160 | loss_kp_2d: 251.98 | loss_kp_3d: 5.89 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.11 | d_m_disc_loss: 0.14 | da Epoch 2/50 |## | (42/500) | Total: 0:00:22 | ETA: 0:03:54 | loss: 93.1862 | loss_kp_2d: 216.82 | loss_kp_3d: 5.28 | e_m_disc_loss: 0.11 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.14 | d_m_disc_loss: 0.17 | da Epoch 2/50 |## | (43/500) | Total: 0:00:23 | ETA: 0:03:54 | loss: 95.2027 | loss_kp_2d: 172.50 | loss_kp_3d: 6.58 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.12 | d_m_disc_loss: 0.15 | da Epoch 2/50 |## | (44/500) | Total: 0:00:23 | ETA: 0:03:51 | loss: 96.1961 | loss_kp_2d: 130.74 | loss_kp_3d: 7.57 | e_m_disc_loss: 0.25 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da Epoch 2/50 |## | (45/500) | Total: 0:00:24 | ETA: 0:03:51 | loss: 96.5522 | loss_kp_2d: 104.14 | loss_kp_3d: 7.58 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da Epoch 2/50 |## | (46/500) | Total: 0:00:24 | ETA: 0:03:50 | loss: 98.0207 | loss_kp_2d: 156.55 | loss_kp_3d: 6.67 | e_m_disc_loss: 0.43 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.08 | da Epoch 2/50 |### | (47/500) | Total: 0:00:25 | ETA: 0:03:50 | loss: 97.9087 | loss_kp_2d: 85.54 | loss_kp_3d: 6.79 | e_m_disc_loss: 0.37 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | dat Epoch 2/50 |### | (48/500) | Total: 0:00:25 | ETA: 0:03:49 | loss: 97.6625 | loss_kp_2d: 78.66 | loss_kp_3d: 6.91 | e_m_disc_loss: 0.47 | d_m_disc_real: 0.05 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.12 | dat Epoch 2/50 |### | (49/500) | Total: 0:00:26 | ETA: 0:03:48 | loss: 98.7095 | loss_kp_2d: 142.36 | loss_kp_3d: 5.67 | e_m_disc_loss: 0.55 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.12 | da Epoch 2/50 |### | (50/500) | Total: 0:00:26 | ETA: 0:03:47 | loss: 98.3281 | loss_kp_2d: 72.09 | loss_kp_3d: 6.76 | e_m_disc_loss: 0.75 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | dat Epoch 2/50 |### | (51/500) | Total: 0:00:27 | ETA: 0:03:47 | loss: 98.9621 | loss_kp_2d: 122.80 | loss_kp_3d: 6.97 | e_m_disc_loss: 0.59 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | da Epoch 2/50 |### | (52/500) | Total: 0:00:27 | ETA: 0:03:45 | loss: 98.5644 | loss_kp_2d: 71.65 | loss_kp_3d: 5.90 | e_m_disc_loss: 0.67 | d_m_disc_real: 0.12 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.16 | dat Epoch 2/50 |### | (53/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.9029 | loss_kp_2d: 109.82 | loss_kp_3d: 5.81 | e_m_disc_loss: 0.65 | d_m_disc_real: 0.09 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.14 | da Epoch 2/50 |### | (54/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.7054 | loss_kp_2d: 81.66 | loss_kp_3d: 5.94 | e_m_disc_loss: 0.56 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.13 | dat Epoch 2/50 |### | (55/500) | Total: 0:00:29 | ETA: 0:03:43 | loss: 98.1078 | loss_kp_2d: 58.58 | loss_kp_3d: 6.82 | e_m_disc_loss: 0.48 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.12 | dat

This is my cfg: 2023-01-05 19:37:06,989 GPU name -> NVIDIA GeForce RTX 3060 2023-01-05 19:37:06,990 GPU feat -> _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28) 2023-01-05 19:37:06,990 {'CUDNN': CfgNode({'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}), 'DATASET': CfgNode({'SEQLEN': 16, 'OVERLAP': 0.5}), 'DEBUG': False, 'DEBUG_FREQ': 5, 'DEVICE': 'cuda', 'EXP_NAME': 'vibe', 'LOGDIR': 'results/vibe_tests\05-01-2023_19-37-06_vibe', 'LOSS': {'D_MOTION_LOSS_W': 0.5, 'KP_2D_W': 300.0, 'KP_3D_W': 300.0, 'POSE_W': 60.0, 'SHAPE_W': 0.06}, 'MODEL': {'TEMPORAL_TYPE': 'gru', 'TGRU': {'ADD_LINEAR': True, 'BIDIRECTIONAL': False, 'HIDDEN_SIZE': 1024, 'NUM_LAYERS': 2, 'RESIDUAL': True}}, 'NUM_WORKERS': 0, 'OUTPUT_DIR': 'results/vibe_tests', 'SEED_VALUE': -1, 'TRAIN': {'BATCH_SIZE': 64, 'DATASETS_2D': ['Insta'], 'DATASETS_3D': ['MPII3D'], 'DATASET_EVAL': 'ThreeDPW', 'DATA_2D_RATIO': 0.6, 'END_EPOCH': 50, 'GEN_LR': 5e-05, 'GEN_MOMENTUM': 0.9, 'GEN_OPTIM': 'Adam', 'GEN_WD': 0.0, 'LR_PATIENCE': 5, 'MOT_DISCR': {'ATT': {'DROPOUT': 0.2, 'LAYERS': 3, 'SIZE': 1024}, 'DIM': 1024, 'FEATURE_POOL': 'attention', 'HIDDEN_SIZE': 1024, 'LR': 0.0001, 'MOMENTUM': 0.9, 'NUM_LAYERS': 2, 'OPTIM': 'Adam', 'UPDATE_STEPS': 1, 'WD': 0.0001}, 'NUM_ITERS_PER_EPOCH': 500, 'PRETRAINED': '', 'PRETRAINED_REGRESSOR': 'data/vibe_data/spin_model_checkpoint.pth.tar', 'RESUME': '', 'START_EPOCH': 0}}