Closed seon-creator closed 1 year ago
If your dataset lacks mask annotations, you can set up your configuration file in line with the CrowdPose example found here: https://github.com/open-mmlab/mmpose/blob/537bd8e543ab463fb55120d5caaa1ae22d6aaf06/configs/body_2d_keypoint/dekr/crowdpose/dekr_hrnet-w32_8xb10-300e_crowdpose-512x512.py#L124-L131
Thank you for your advice! After then I tried to train and it's work. But when evaluation steps, It gets error.
09/01 21:55:02 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
09/01 21:55:02 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
09/01 21:55:02 - mmengine - INFO - Checkpoints will be saved to /data/home/seondeok/Project/acupoint/mmpose/work_dirs/Bottom_up/DEKR_hrnet_w32_0901.
09/01 21:55:21 - mmengine - INFO - Epoch(train) [1][ 50/106] lr: 9.909820e-05 eta: 2:10:22 time: 0.369846 data_time: 0.191051 memory: 7741 loss: 0.001332 loss/heatmap: 0.000879 loss/displacement: 0.000454
09/01 21:55:38 - mmengine - INFO - Epoch(train) [1][100/106] lr: 1.991984e-04 eta: 2:06:07 time: 0.347460 data_time: 0.189022 memory: 7741 loss: 0.001064 loss/heatmap: 0.000614 loss/displacement: 0.000450
09/01 21:55:40 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:55:49 - mmengine - INFO - Epoch(train) [2][ 50/106] lr: 3.113106e-04 eta: 1:44:02 time: 0.167862 data_time: 0.009961 memory: 7741 loss: 0.000945 loss/heatmap: 0.000509 loss/displacement: 0.000436
09/01 21:55:57 - mmengine - INFO - Epoch(train) [2][100/106] lr: 4.114108e-04 eta: 1:32:45 time: 0.166636 data_time: 0.008195 memory: 7741 loss: 0.000918 loss/heatmap: 0.000494 loss/displacement: 0.000424
09/01 21:55:58 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:56:06 - mmengine - INFO - Epoch(train) [3][ 50/106] lr: 5.235230e-04 eta: 1:25:16 time: 0.169351 data_time: 0.011285 memory: 7741 loss: 0.000854 loss/heatmap: 0.000450 loss/displacement: 0.000404
09/01 21:56:15 - mmengine - INFO - Epoch(train) [3][100/106] lr: 6.236232e-04 eta: 1:20:43 time: 0.166557 data_time: 0.007964 memory: 7741 loss: 0.000805 loss/heatmap: 0.000432 loss/displacement: 0.000372
09/01 21:56:16 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:56:24 - mmengine - INFO - Epoch(train) [4][ 50/106] lr: 7.357355e-04 eta: 1:17:05 time: 0.167826 data_time: 0.009669 memory: 7741 loss: 0.000706 loss/heatmap: 0.000408 loss/displacement: 0.000298
09/01 21:56:32 - mmengine - INFO - Epoch(train) [4][100/106] lr: 8.358357e-04 eta: 1:14:40 time: 0.168171 data_time: 0.009046 memory: 7742 loss: 0.000585 loss/heatmap: 0.000378 loss/displacement: 0.000208
09/01 21:56:33 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:56:42 - mmengine - INFO - Epoch(train) [5][ 50/106] lr: 9.479479e-04 eta: 1:12:30 time: 0.168413 data_time: 0.009599 memory: 7742 loss: 0.000525 loss/heatmap: 0.000376 loss/displacement: 0.000148
09/01 21:56:50 - mmengine - INFO - Epoch(train) [5][100/106] lr: 1.000000e-03 eta: 1:10:54 time: 0.166663 data_time: 0.008285 memory: 7742 loss: 0.000498 loss/heatmap: 0.000380 loss/displacement: 0.000118
09/01 21:56:51 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:56:59 - mmengine - INFO - Epoch(train) [6][ 50/106] lr: 1.000000e-03 eta: 1:09:26 time: 0.168582 data_time: 0.009675 memory: 7742 loss: 0.000448 loss/heatmap: 0.000346 loss/displacement: 0.000101
09/01 21:57:08 - mmengine - INFO - Epoch(train) [6][100/106] lr: 1.000000e-03 eta: 1:08:19 time: 0.167408 data_time: 0.008816 memory: 7742 loss: 0.000419 loss/heatmap: 0.000335 loss/displacement: 0.000083
09/01 21:57:09 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:57:17 - mmengine - INFO - Epoch(train) [7][ 50/106] lr: 1.000000e-03 eta: 1:07:16 time: 0.169302 data_time: 0.009737 memory: 7742 loss: 0.000415 loss/heatmap: 0.000336 loss/displacement: 0.000079
09/01 21:57:26 - mmengine - INFO - Epoch(train) [7][100/106] lr: 1.000000e-03 eta: 1:06:24 time: 0.166421 data_time: 0.007720 memory: 7741 loss: 0.000364 loss/heatmap: 0.000298 loss/displacement: 0.000067
09/01 21:57:27 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:57:35 - mmengine - INFO - Epoch(train) [8][ 50/106] lr: 1.000000e-03 eta: 1:05:34 time: 0.168857 data_time: 0.009598 memory: 7742 loss: 0.000360 loss/heatmap: 0.000296 loss/displacement: 0.000064
09/01 21:57:43 - mmengine - INFO - Epoch(train) [8][100/106] lr: 1.000000e-03 eta: 1:04:55 time: 0.168401 data_time: 0.009147 memory: 7741 loss: 0.000356 loss/heatmap: 0.000292 loss/displacement: 0.000064
09/01 21:57:44 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:57:53 - mmengine - INFO - Epoch(train) [9][ 50/106] lr: 1.000000e-03 eta: 1:04:14 time: 0.168675 data_time: 0.009869 memory: 7741 loss: 0.000342 loss/heatmap: 0.000284 loss/displacement: 0.000058
09/01 21:58:01 - mmengine - INFO - Epoch(train) [9][100/106] lr: 1.000000e-03 eta: 1:03:42 time: 0.168189 data_time: 0.008862 memory: 7741 loss: 0.000346 loss/heatmap: 0.000290 loss/displacement: 0.000056
09/01 21:58:02 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:58:10 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:58:11 - mmengine - INFO - Epoch(train) [10][ 50/106] lr: 1.000000e-03 eta: 1:03:07 time: 0.168612 data_time: 0.009756 memory: 7741 loss: 0.000323 loss/heatmap: 0.000273 loss/displacement: 0.000050
09/01 21:58:19 - mmengine - INFO - Epoch(train) [10][100/106] lr: 1.000000e-03 eta: 1:02:38 time: 0.167257 data_time: 0.007840 memory: 7741 loss: 0.000335 loss/heatmap: 0.000280 loss/displacement: 0.000055
09/01 21:58:20 - mmengine - INFO - Exp name: dekr_hrnet-w32_8xb10-140e_coco-512x512_20230901_215454
09/01 21:58:20 - mmengine - INFO - Saving checkpoint at 10 epochs
/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/nn/functional.py:4236: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for det$
warnings.warn(
Traceback (most recent call last):
File "tools/train.py", line 161, in <module>
main()
File "tools/train.py", line 157, in main
runner.train()
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 102, in run
self.runner.val_loop.run()
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 363, in run
self.run_iter(idx, data_batch)
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 383, in run_iter
outputs = self.runner.model.val_step(data_batch)
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 133, in val_step
return self._run_forward(data, mode='predict') # type: ignore
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
results = self(**data, mode=mode)
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/pose_estimators/base.py", line 142, in forward
return self.predict(inputs, data_samples)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/pose_estimators/bottomup.py", line 127, in predict
preds = self.head.predict(feats, data_samples, test_cfg=self.test_cfg)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/heads/hybrid_heads/dekr_head.py", line 460, in predict
preds = self.decode(heatmaps, displacements, test_cfg, metainfo)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/heads/hybrid_heads/dekr_head.py", line 520, in decode
instance_scores = self.rescore_net(keypoints, keypoint_scores,
File "/data/home/seondeok/.conda/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/heads/hybrid_heads/dekr_head.py", line 161, in forward
feature = self.make_feature(keypoints, keypoint_scores, skeleton)
File "/data/home/seondeok/Project/acupoint/mmpose/mmpose/models/heads/hybrid_heads/dekr_head.py", line 147, in make_feature
normalize = (joint_length[:, self.norm_indexes[0]] +
IndexError: index 5 is out of bounds for dimension 1 with size 5
Should I have to change another things in config? My dataset has 5 keypoints.
Please remove the rescore_cfg
from the config for custom datasets that have a different number of keypoints than COCO or CrowdPose. The rescore nets are pretrained on respective datasets and will not be updated during training.
Thank you, so I removed the rescore_cfg and It's work. But all of the AP is 0.000. How to solve it?
Loading and preparing results...
DONE (t=0.01s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *keypoints*
DONE (t=0.01s).
Accumulating evaluation results...
DONE (t=0.00s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000
Have you visualized the predictions of the model on your data? You can follow this doc to check the model outputs
Yes, the visualized results works and keypoints are also well presented at 200epochs or 300epochs. But the AP values are all zero, so I don't know which epochs is the best AP.
The metric AP is highly sensitive to the parameter 'sigmas' defined in the dataset metainformation file. Perhaps you could try to enlarge the value of 'sigmas' for each keypoint and check if the AP goes up
I am getting the same problem training DEKR, my model gives good results on validation data but AP values are all 0. Increasing sigma does not help. Did you ever figure this out?
Prerequisite
Environment
The environment is followed by mmpose_Tutorial.ipynb.
Reproduces the problem - code sample
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
Is the format in bottom up training has problem?