05/21 16:34:36 - mmengine - INFO - Checkpoints will be saved to D:\mmpose\training_log.
05/21 16:35:11 - mmengine - INFO - Epoch(train) [1][50/91] lr: 4.954910e-05 eta: 3:42:15 time: 0.699657 data_time: 0.125292 memory: 15535 loss: 0.003113 loss_kpt: 0.003113 acc_pose: 0.695949
05/21 16:35:32 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:35:59 - mmengine - INFO - Epoch(train) [2][50/91] lr: 1.406403e-04 eta: 3:04:39 time: 0.530763 data_time: 0.021162 memory: 15535 loss: 0.000698 loss_kpt: 0.000698 acc_pose: 0.967511
05/21 16:36:21 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:36:47 - mmengine - INFO - Epoch(train) [3][50/91] lr: 2.317315e-04 eta: 2:57:37 time: 0.536121 data_time: 0.020744 memory: 15535 loss: 0.000409 loss_kpt: 0.000409 acc_pose: 0.983145
05/21 16:37:09 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:37:36 - mmengine - INFO - Epoch(train) [4][50/91] lr: 3.228226e-04 eta: 2:54:00 time: 0.548706 data_time: 0.020879 memory: 15535 loss: 0.000316 loss_kpt: 0.000316 acc_pose: 0.998413
05/21 16:37:57 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:38:23 - mmengine - INFO - Epoch(train) [5][50/91] lr: 4.139138e-04 eta: 2:50:52 time: 0.530613 data_time: 0.019982 memory: 15535 loss: 0.000258 loss_kpt: 0.000258 acc_pose: 1.000000
05/21 16:38:45 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:39:12 - mmengine - INFO - Epoch(train) [6][50/91] lr: 5.000000e-04 eta: 2:49:28 time: 0.544714 data_time: 0.021153 memory: 15535 loss: 0.000238 loss_kpt: 0.000238 acc_pose: 0.993846
05/21 16:39:33 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:40:00 - mmengine - INFO - Epoch(train) [7][50/91] lr: 5.000000e-04 eta: 2:47:37 time: 0.534784 data_time: 0.021027 memory: 15535 loss: 0.000230 loss_kpt: 0.000230 acc_pose: 0.992089
05/21 16:40:22 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:40:49 - mmengine - INFO - Epoch(train) [8][50/91] lr: 5.000000e-04 eta: 2:46:34 time: 0.535202 data_time: 0.021158 memory: 15535 loss: 0.000166 loss_kpt: 0.000166 acc_pose: 0.998464
05/21 16:41:10 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:41:37 - mmengine - INFO - Epoch(train) [9][50/91] lr: 5.000000e-04 eta: 2:45:03 time: 0.531518 data_time: 0.021564 memory: 15535 loss: 0.000148 loss_kpt: 0.000148 acc_pose: 0.995440
05/21 16:41:58 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:42:25 - mmengine - INFO - Epoch(train) [10][50/91] lr: 5.000000e-04 eta: 2:43:56 time: 0.538365 data_time: 0.020905 memory: 15535 loss: 0.000131 loss_kpt: 0.000131 acc_pose: 0.995228
05/21 16:42:46 - mmengine - INFO - Exp name: td-hm_res152_8xb32-210e_coco-384x288_20230521_163419
05/21 16:42:46 - mmengine - INFO - Saving checkpoint at 10 epochs
Traceback (most recent call last):
File "tools/train.py", line 160, in <module>
main()
File "tools/train.py", line 156, in main
model = self.train_loop.run() # type: ignore
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\mmengine\runner\loops.py", line 102, in run
self.runner.val_loop.run()
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\mmengine\runner\loops.py", line 363, in run
self.run_iter(idx, data_batch)
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\torch\autograd\grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\mmengine\runner\loops.py", line 383, in run_iter
outputs = self.runner.model.val_step(data_batch)
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\mmengine\model\base_model\base_model.py", line 133, in val_step
return self._run_forward(data, mode='predict') # type: ignore
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\mmengine\model\base_model\base_model.py", line 340, in _run_forward
results = self(**data, mode=mode)
File "C:\Users\user\anaconda3\envs\openmmlab\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "d:\mmpose\mmpose\models\pose_estimators\base.py", line 140, in forward
return self.predict(inputs, data_samples)
File "d:\mmpose\mmpose\models\pose_estimators\topdown.py", line 109, in predict
preds = self.head.predict(feats, data_samples, test_cfg=self.test_cfg)
File "d:\mmpose\mmpose\models\heads\heatmap_heads\heatmap_head.py", line 261, in predict
_batch_heatmaps_flip = flip_heatmaps(
File "d:\mmpose\mmpose\models\utils\tta.py", line 39, in flip_heatmaps
assert len(flip_indices) == heatmaps.shape[1]
After the training process stopped, I encountered this problem.
Additional information
I encountered a bug mentioned above after training, and I didn't know whether this bug leaded to the absence of the test (evaluators? )or did I miss some args for the visualization of test?
Thank you !!
Thanks for using MMPose. You need to specify swap in dataset info to define the flip pairs that RandomFlip requires. Please refer to docs for more details.
Prerequisite
Environment
OrderedDict([('sys.platform', 'win32'), ('Python', '3.8.16 (default, Mar 2 2023, 03:18:16) [MSC v.1916 64 bit (AMD64)]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3090'), ('CUDA_HOME', 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1'), ('NVCC', 'Cuda compilation tools, release 11.1, V11.1.105'), ('MSVC', 'Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64'), ('GCC', 'n/a'), ('PyTorch', '1.9.1+cu111'), ('PyTorch compiling details', 'PyTorch built with:\n - C++ Version: 199711\n - MSVC 192829337\n - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n - OpenMP 2019\n - CPU capability usage: AVX2\n - CUDA Runtime 11.1\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.0.5\n - Magma 2.5.4\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=C:/w/b/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/w/b/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, \n'), ('TorchVision', '0.10.1+cu111'), ('OpenCV', '4.7.0'), ('MMEngine', '0.7.3'), ('MMPose', '1.0.0+2c4a60e')])
Reproduces the problem - code sample
Reproduces the problem - command or script
This is the training script I used
Reproduces the problem - error message
After the training process stopped, I encountered this problem.
Additional information
I encountered a bug mentioned above after training, and I didn't know whether this bug leaded to the absence of the test (evaluators? )or did I miss some args for the visualization of test? Thank you !!