open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.56k stars 1.21k forks source link

train on custom dataset but AP(M):-1 #1524

Closed yiweike closed 2 years ago

yiweike commented 2 years ago

The Log: 2022-07-29 10:34:45,983 - mmpose - INFO - workflow: [('train', 1)], max: 210 epochs 2022-07-29 10:35:08,868 - mmpose - INFO - Epoch [1][50/681] lr: 4.945e-05, eta: 18:10:25, time: 0.458, data_time: 0.053, memory: 3339, heatmap_loss: 0.0025, acc_pose: 0.0500, loss: 0.0025 2022-07-29 10:35:28,349 - mmpose - INFO - Epoch [1][100/681] lr: 9.940e-05, eta: 16:49:01, time: 0.390, data_time: 0.000, memory: 3339, heatmap_loss: 0.0022, acc_pose: 0.2586, loss: 0.0022 2022-07-29 10:35:47,940 - mmpose - INFO - Epoch [1][150/681] lr: 1.494e-04, eta: 16:23:25, time: 0.392, data_time: 0.000, memory: 3339, heatmap_loss: 0.0021, acc_pose: 0.3422, loss: 0.0021 2022-07-29 10:36:07,555 - mmpose - INFO - Epoch [1][200/681] lr: 1.993e-04, eta: 16:10:44, time: 0.392, data_time: 0.000, memory: 3339, heatmap_loss: 0.0019, acc_pose: 0.3838, loss: 0.0019 2022-07-29 10:36:27,240 - mmpose - INFO - Epoch [1][250/681] lr: 2.493e-04, eta: 16:03:39, time: 0.394, data_time: 0.000, memory: 3339, heatmap_loss: 0.0019, acc_pose: 0.4057, loss: 0.0019 2022-07-29 10:36:46,977 - mmpose - INFO - Epoch [1][300/681] lr: 2.992e-04, eta: 15:59:15, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0019, acc_pose: 0.4364, loss: 0.0019 2022-07-29 10:37:06,868 - mmpose - INFO - Epoch [1][350/681] lr: 3.492e-04, eta: 15:57:03, time: 0.398, data_time: 0.000, memory: 3339, heatmap_loss: 0.0018, acc_pose: 0.4224, loss: 0.0018 2022-07-29 10:37:26,812 - mmpose - INFO - Epoch [1][400/681] lr: 3.991e-04, eta: 15:55:38, time: 0.399, data_time: 0.000, memory: 3339, heatmap_loss: 0.0019, acc_pose: 0.3990, loss: 0.0019 2022-07-29 10:37:46,708 - mmpose - INFO - Epoch [1][450/681] lr: 4.491e-04, eta: 15:54:12, time: 0.398, data_time: 0.000, memory: 3339, heatmap_loss: 0.0018, acc_pose: 0.4572, loss: 0.0018 2022-07-29 10:38:06,729 - mmpose - INFO - Epoch [1][500/681] lr: 4.990e-04, eta: 15:53:35, time: 0.400, data_time: 0.000, memory: 3339, heatmap_loss: 0.0018, acc_pose: 0.4405, loss: 0.0018 2022-07-29 10:38:26,745 - mmpose - INFO - Epoch [1][550/681] lr: 5.000e-04, eta: 15:53:00, time: 0.400, data_time: 0.000, memory: 3339, heatmap_loss: 0.0019, acc_pose: 0.4313, loss: 0.0019 2022-07-29 10:38:46,964 - mmpose - INFO - Epoch [1][600/681] lr: 5.000e-04, eta: 15:53:15, time: 0.404, data_time: 0.000, memory: 3339, heatmap_loss: 0.0018, acc_pose: 0.4833, loss: 0.0018 2022-07-29 10:39:06,820 - mmpose - INFO - Epoch [1][650/681] lr: 5.000e-04, eta: 15:52:06, time: 0.397, data_time: 0.000, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.4723, loss: 0.0017 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1341/1341, 64.8 task/s, elapsed: 21s, ETA: 0sLoading and preparing results... DONE (t=0.04s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=0.23s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.545 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.709 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.628 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.546 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.682 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.831 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.761 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.682 2022-07-29 10:39:45,769 - mmpose - INFO - Now best checkpoint is saved as best_AP_epoch_1.pth. 2022-07-29 10:39:45,770 - mmpose - INFO - Best AP is 0.5454 at 1 epoch. 2022-07-29 10:39:45,771 - mmpose - INFO - Epoch(val) [1][112] AP: 0.5454, AP .5: 0.7086, AP .75: 0.6276, AP (M): -1.0000, AP (L): 0.5456, AR: 0.6823, AR .5: 0.8315, AR .75: 0.7606, AR (M): -1.0000, AR (L): 0.6823 2022-07-29 10:40:08,990 - mmpose - INFO - Epoch [2][50/681] lr: 5.000e-04, eta: 15:21:24, time: 0.464, data_time: 0.055, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.4634, loss: 0.0017 2022-07-29 10:40:29,085 - mmpose - INFO - Epoch [2][100/681] lr: 5.000e-04, eta: 15:23:06, time: 0.402, data_time: 0.000, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.4723, loss: 0.0017 2022-07-29 10:40:49,159 - mmpose - INFO - Epoch [2][150/681] lr: 5.000e-04, eta: 15:24:29, time: 0.401, data_time: 0.000, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.5011, loss: 0.0017 2022-07-29 10:41:09,300 - mmpose - INFO - Epoch [2][200/681] lr: 5.000e-04, eta: 15:25:52, time: 0.403, data_time: 0.000, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.4972, loss: 0.0017 2022-07-29 10:41:29,062 - mmpose - INFO - Epoch [2][250/681] lr: 5.000e-04, eta: 15:26:06, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5101, loss: 0.0016 2022-07-29 10:41:48,816 - mmpose - INFO - Epoch [2][300/681] lr: 5.000e-04, eta: 15:26:15, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0017, acc_pose: 0.4938, loss: 0.0017 2022-07-29 10:42:08,604 - mmpose - INFO - Epoch [2][350/681] lr: 5.000e-04, eta: 15:26:26, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5200, loss: 0.0016 2022-07-29 10:42:28,386 - mmpose - INFO - Epoch [2][400/681] lr: 5.000e-04, eta: 15:26:34, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5062, loss: 0.0016 2022-07-29 10:42:48,263 - mmpose - INFO - Epoch [2][450/681] lr: 5.000e-04, eta: 15:26:51, time: 0.398, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5255, loss: 0.0016 2022-07-29 10:43:08,082 - mmpose - INFO - Epoch [2][500/681] lr: 5.000e-04, eta: 15:26:58, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.4934, loss: 0.0016 2022-07-29 10:43:27,819 - mmpose - INFO - Epoch [2][550/681] lr: 5.000e-04, eta: 15:26:53, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5283, loss: 0.0016 2022-07-29 10:43:47,607 - mmpose - INFO - Epoch [2][600/681] lr: 5.000e-04, eta: 15:26:53, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5343, loss: 0.0016 2022-07-29 10:44:07,352 - mmpose - INFO - Epoch [2][650/681] lr: 5.000e-04, eta: 15:26:46, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5247, loss: 0.0016 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1341/1341, 64.7 task/s, elapsed: 21s, ETA: 0sLoading and preparing results... DONE (t=0.04s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=0.23s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.610 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.707 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.666 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.611 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.742 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.837 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.799 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.742 2022-07-29 10:44:41,510 - mmpose - INFO - Now best checkpoint is saved as best_AP_epoch_2.pth. 2022-07-29 10:44:41,510 - mmpose - INFO - Best AP is 0.6100 at 2 epoch. 2022-07-29 10:44:41,511 - mmpose - INFO - Epoch(val) [2][112] AP: 0.6100, AP .5: 0.7075, AP .75: 0.6658, AP (M): -1.0000, AP (L): 0.6109, AR: 0.7416, AR .5: 0.8367, AR .75: 0.7994, AR (M): -1.0000, AR (L): 0.7416 2022-07-29 10:45:04,147 - mmpose - INFO - Epoch [3][50/681] lr: 5.000e-04, eta: 15:10:56, time: 0.453, data_time: 0.055, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5422, loss: 0.0016 2022-07-29 10:45:23,897 - mmpose - INFO - Epoch [3][100/681] lr: 5.000e-04, eta: 15:11:20, time: 0.395, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5370, loss: 0.0015 2022-07-29 10:45:43,752 - mmpose - INFO - Epoch [3][150/681] lr: 5.000e-04, eta: 15:11:52, time: 0.397, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5497, loss: 0.0016 2022-07-29 10:46:03,570 - mmpose - INFO - Epoch [3][200/681] lr: 5.000e-04, eta: 15:12:16, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5488, loss: 0.0015 2022-07-29 10:46:23,292 - mmpose - INFO - Epoch [3][250/681] lr: 5.000e-04, eta: 15:12:29, time: 0.394, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5221, loss: 0.0015 2022-07-29 10:46:43,291 - mmpose - INFO - Epoch [3][300/681] lr: 5.000e-04, eta: 15:13:04, time: 0.400, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5530, loss: 0.0015 2022-07-29 10:47:03,448 - mmpose - INFO - Epoch [3][350/681] lr: 5.000e-04, eta: 15:13:49, time: 0.403, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5574, loss: 0.0015 2022-07-29 10:47:23,273 - mmpose - INFO - Epoch [3][400/681] lr: 5.000e-04, eta: 15:14:04, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5513, loss: 0.0015 2022-07-29 10:47:43,072 - mmpose - INFO - Epoch [3][450/681] lr: 5.000e-04, eta: 15:14:14, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0016, acc_pose: 0.5351, loss: 0.0016 2022-07-29 10:48:02,773 - mmpose - INFO - Epoch [3][500/681] lr: 5.000e-04, eta: 15:14:16, time: 0.394, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5379, loss: 0.0015 2022-07-29 10:48:22,557 - mmpose - INFO - Epoch [3][550/681] lr: 5.000e-04, eta: 15:14:22, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5792, loss: 0.0015 2022-07-29 10:48:42,336 - mmpose - INFO - Epoch [3][600/681] lr: 5.000e-04, eta: 15:14:27, time: 0.396, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5483, loss: 0.0015 2022-07-29 10:49:02,178 - mmpose - INFO - Epoch [3][650/681] lr: 5.000e-04, eta: 15:14:35, time: 0.397, data_time: 0.000, memory: 3339, heatmap_loss: 0.0015, acc_pose: 0.5280, loss: 0.0015 [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1341/1341, 64.6 task/s, elapsed: 21s, ETA: 0sLoading and preparing results... DONE (t=0.04s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=0.30s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.613 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.709 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.664 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.613 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.746 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.836 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.793 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.746 2022-07-29 10:49:36,440 - mmpose - INFO - Now best checkpoint is saved as best_AP_epoch_3.pth. 2022-07-29 10:49:36,440 - mmpose - INFO - Best AP is 0.6131 at 3 epoch.

configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/custom/hrnet_w48_custom_256x192.py base = [ '../../../../base/default_runtime.py', '../../../../base/datasets/coco.py' ] evaluation = dict(interval=1, metric='mAP', save_best='AP')

optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[170, 200]) total_epochs = 210 channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], ], inference_channel=[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ])

model settings

model = dict( type='TopDown', pretrained='https://download.openmmlab.com/mmpose/' 'pretrain_models/hrnet_w48-8ef0771d.pth', backbone=dict( type='HRNet', in_channels=3, extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(48, 96)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384))), ), keypoint_head=dict( type='TopdownHeatmapSimpleHead', in_channels=48, out_channels=channel_cfg['num_output_channels'], num_deconv_layers=0, extra=dict(final_conv_kernel=1, ), loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), train_cfg=dict(), test_cfg=dict( flip_test=True, post_process='default', shift_heatmap=True, modulate_kernel=11))

data_cfg = dict( image_size=[192, 256], heatmap_size=[48, 64], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=False, det_bbox_thr=0.0, bbox_file='data/custom/person_detection_results/' 'bbox_results.json', )

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=2), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

test_pipeline = val_pipeline

data_root = 'data/custom' data = dict( samples_per_gpu=12, workers_per_gpu=12, val_dataloader=dict(samples_per_gpu=12), test_dataloader=dict(samples_per_gpu=12), train=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_train2017.json', img_prefix=f'{data_root}/train/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/val/', data_cfg=data_cfg, pipeline=val_pipeline, dataset_info={{base.dataset_info}}), test=dict( type='TopDownCocoDataset', ann_file=f'{data_root}/annotations/person_keypoints_val2017.json', img_prefix=f'{data_root}/val/', data_cfg=data_cfg, pipeline=test_pipeline, dataset_info={{base.dataset_info}}), )

configs/base/dataset/custom.py dataset_info = dict( dataset_name='custom', paper_info=dict( author='Me' title='my custom dataset', container='my dataset in school', year='2022', homepage='None', ), keypoint_info={ 0: dict(name='nose', id=0, color=[51, 153, 255], type='upper', swap=''), 1: dict( name='left_eye', id=1, color=[51, 153, 255], type='upper', swap='right_eye'), 2: dict( name='right_eye', id=2, color=[51, 153, 255], type='upper', swap='left_eye'), 3: dict( name='left_ear', id=3, color=[51, 153, 255], type='upper', swap='right_ear'), 4: dict( name='right_ear', id=4, color=[51, 153, 255], type='upper', swap='left_ear'), 5: dict( name='left_shoulder', id=5, color=[0, 255, 0], type='upper', swap='right_shoulder'), 6: dict( name='right_shoulder', id=6, color=[255, 128, 0], type='upper', swap='left_shoulder'), 7: dict( name='left_elbow', id=7, color=[0, 255, 0], type='upper', swap='right_elbow'), 8: dict( name='right_elbow', id=8, color=[255, 128, 0], type='upper', swap='left_elbow'), 9: dict( name='left_wrist', id=9, color=[0, 255, 0], type='upper', swap='right_wrist'), 10: dict( name='right_wrist', id=10, color=[255, 128, 0], type='upper', swap='left_wrist'), 11: dict( name='left_hip', id=11, color=[0, 255, 0], type='lower', swap='right_hip'), 12: dict( name='right_hip', id=12, color=[255, 128, 0], type='lower', swap='left_hip'), 13: dict( name='left_knee', id=13, color=[0, 255, 0], type='lower', swap='right_knee'), 14: dict( name='right_knee', id=14, color=[255, 128, 0], type='lower', swap='left_knee'), 15: dict( name='left_ankle', id=15, color=[0, 255, 0], type='lower', swap='right_ankle'), 16: dict( name='right_ankle', id=16, color=[255, 128, 0], type='lower', swap='left_ankle') }, skeleton_info={ 0: dict(link=('left_ankle', 'left_knee'), id=0, color=[0, 255, 0]), 1: dict(link=('left_knee', 'left_hip'), id=1, color=[0, 255, 0]), 2: dict(link=('right_ankle', 'right_knee'), id=2, color=[255, 128, 0]), 3: dict(link=('right_knee', 'right_hip'), id=3, color=[255, 128, 0]), 4: dict(link=('left_hip', 'right_hip'), id=4, color=[51, 153, 255]), 5: dict(link=('left_shoulder', 'left_hip'), id=5, color=[51, 153, 255]), 6: dict(link=('right_shoulder', 'right_hip'), id=6, color=[51, 153, 255]), 7: dict( link=('left_shoulder', 'right_shoulder'), id=7, color=[51, 153, 255]), 8: dict(link=('left_shoulder', 'left_elbow'), id=8, color=[0, 255, 0]), 9: dict( link=('right_shoulder', 'right_elbow'), id=9, color=[255, 128, 0]), 10: dict(link=('left_elbow', 'left_wrist'), id=10, color=[0, 255, 0]), 11: dict(link=('right_elbow', 'right_wrist'), id=11, color=[255, 128, 0]), 12: dict(link=('left_eye', 'right_eye'), id=12, color=[51, 153, 255]), 13: dict(link=('nose', 'left_eye'), id=13, color=[51, 153, 255]), 14: dict(link=('nose', 'right_eye'), id=14, color=[51, 153, 255]), 15: dict(link=('left_eye', 'left_ear'), id=15, color=[51, 153, 255]), 16: dict(link=('right_eye', 'right_ear'), id=16, color=[51, 153, 255]), 17: dict(link=('left_ear', 'left_shoulder'), id=17, color=[51, 153, 255]), 18: dict( link=('right_ear', 'right_shoulder'), id=18, color=[51, 153, 255]) }, joint_weights=[ 1., 1., 1., 1., 1., 1., 1., 1.2, 1.2, 1.5, 1.5, 1., 1., 1.2, 1.2, 1.5, 1.5 ], sigmas=[ 0.026, 0.025, 0.025, 0.035, 0.035, 0.079, 0.079, 0.072, 0.072, 0.062, 0.062, 0.107, 0.107, 0.087, 0.087, 0.089, 0.089 ])

mmpose/datasets/datasets/top_down/topdown_custom_dataset.py

Copyright (c) OpenMMLab. All rights reserved.

import os.path as osp import tempfile import warnings from collections import OrderedDict, defaultdict

import json_tricks as json import numpy as np from mmcv import Config, deprecated_api_warning from xtcocotools.cocoeval import COCOeval

from ....core.post_processing import oks_nms, soft_oks_nms from ...builder import DATASETS from ..base import Kpt2dSviewRgbImgTopDownDataset

@DATASETS.register_module() class TopDownCustomDataset(Kpt2dSviewRgbImgTopDownDataset): """CocoDataset dataset for top-down pose estimation.

"Microsoft COCO: Common Objects in Context", ECCV'2014.
More details can be found in the `paper
<https://arxiv.org/abs/1405.0312>`__ .

The dataset loads raw features and apply specified transforms
to return a dict containing the image tensors and other information.

COCO keypoint indexes::

    0: 'nose',
    1: 'left_eye',
    2: 'right_eye',
    3: 'left_ear',
    4: 'right_ear',
    5: 'left_shoulder',
    6: 'right_shoulder',
    7: 'left_elbow',
    8: 'right_elbow',
    9: 'left_wrist',
    10: 'right_wrist',
    11: 'left_hip',
    12: 'right_hip',
    13: 'left_knee',
    14: 'right_knee',
    15: 'left_ankle',
    16: 'right_ankle'

Args:
    ann_file (str): Path to the annotation file.
    img_prefix (str): Path to a directory where images are held.
        Default: None.
    data_cfg (dict): config
    pipeline (list[dict | callable]): A sequence of data transforms.
    dataset_info (DatasetInfo): A class containing all dataset info.
    test_mode (bool): Store True when building test or
        validation dataset. Default: False.
"""

def __init__(self,
             ann_file,
             img_prefix,
             data_cfg,
             pipeline,
             dataset_info=None,
             test_mode=False):

    if dataset_info is None:
        warnings.warn(
            'dataset_info is missing. '
            'Check https://github.com/open-mmlab/mmpose/pull/663 '
            'for details.', DeprecationWarning)
        cfg = Config.fromfile('configs/_base_/datasets/coco.py')
        dataset_info = cfg._cfg_dict['dataset_info']

    super().__init__(
        ann_file,
        img_prefix,
        data_cfg,
        pipeline,
        dataset_info=dataset_info,
        test_mode=test_mode)

    self.use_gt_bbox = data_cfg['use_gt_bbox']
    self.bbox_file = data_cfg['bbox_file']
    self.det_bbox_thr = data_cfg.get('det_bbox_thr', 0.0)
    self.use_nms = data_cfg.get('use_nms', True)
    self.soft_nms = data_cfg['soft_nms']
    self.nms_thr = data_cfg['nms_thr']
    self.oks_thr = data_cfg['oks_thr']
    self.vis_thr = data_cfg['vis_thr']

    self.db = self._get_db()

    print(f'=> num_images: {self.num_images}')
    print(f'=> load {len(self.db)} samples')

def _get_db(self):
    """Load dataset."""
    if (not self.test_mode) or self.use_gt_bbox:
        # use ground truth bbox
        gt_db = self._load_coco_keypoint_annotations()
    else:
        # use bbox from detection
        gt_db = self._load_coco_person_detection_results()
    return gt_db

def _load_coco_keypoint_annotations(self):
    """Ground truth bbox and keypoints."""
    gt_db = []
    for img_id in self.img_ids:
        gt_db.extend(self._load_coco_keypoint_annotation_kernel(img_id))
    return gt_db

def _load_coco_keypoint_annotation_kernel(self, img_id):
    """load annotation from COCOAPI.

    Note:
        bbox:[x1, y1, w, h]

    Args:
        img_id: coco image id

    Returns:
        dict: db entry
    """
    img_ann = self.coco.loadImgs(img_id)[0]
    width = img_ann['width']
    height = img_ann['height']
    num_joints = self.ann_info['num_joints']

    ann_ids = self.coco.getAnnIds(imgIds=img_id, iscrowd=False)
    objs = self.coco.loadAnns(ann_ids)

    # sanitize bboxes
    valid_objs = []
    for obj in objs:
        if 'bbox' not in obj:
            continue
        x, y, w, h = obj['bbox']
        x1 = max(0, x)
        y1 = max(0, y)
        x2 = min(width - 1, x1 + max(0, w - 1))
        y2 = min(height - 1, y1 + max(0, h - 1))
        if ('area' not in obj or obj['area'] > 0) and x2 > x1 and y2 > y1:
            obj['clean_bbox'] = [x1, y1, x2 - x1, y2 - y1]
            valid_objs.append(obj)
    objs = valid_objs

    bbox_id = 0
    rec = []
    for obj in objs:
        if 'keypoints' not in obj:
            continue
        if max(obj['keypoints']) == 0:
            continue
        if 'num_keypoints' in obj and obj['num_keypoints'] == 0:
            continue
        joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
        joints_3d_visible = np.zeros((num_joints, 3), dtype=np.float32)

        keypoints = np.array(obj['keypoints']).reshape(-1, 3)
        joints_3d[:, :2] = keypoints[:, :2]
        joints_3d_visible[:, :2] = np.minimum(1, keypoints[:, 2:3])

        center, scale = self._xywh2cs(*obj['clean_bbox'][:4])

        image_file = osp.join(self.img_prefix, self.id2name[img_id])
        rec.append({
            'image_file': image_file,
            'center': center,
            'scale': scale,
            'bbox': obj['clean_bbox'][:4],
            'rotation': 0,
            'joints_3d': joints_3d,
            'joints_3d_visible': joints_3d_visible,
            'dataset': self.dataset_name,
            'bbox_score': 1,
            'bbox_id': bbox_id
        })
        bbox_id = bbox_id + 1

    return rec

def _load_coco_person_detection_results(self):
    """Load coco person detection results."""
    num_joints = self.ann_info['num_joints']
    all_boxes = None
    with open(self.bbox_file, 'r') as f:
        all_boxes = json.load(f)

    if not all_boxes:
        raise ValueError('=> Load %s fail!' % self.bbox_file)

    print(f'=> Total boxes: {len(all_boxes)}')

    kpt_db = []
    bbox_id = 0
    for det_res in all_boxes:
        if det_res['category_id'] != 1:
            continue

        image_file = osp.join(self.img_prefix,
                              self.id2name[det_res['image_id']])
        box = det_res['bbox']
        score = det_res['score']

        if score < self.det_bbox_thr:
            continue

        center, scale = self._xywh2cs(*box[:4])
        joints_3d = np.zeros((num_joints, 3), dtype=np.float32)
        joints_3d_visible = np.ones((num_joints, 3), dtype=np.float32)
        kpt_db.append({
            'image_file': image_file,
            'center': center,
            'scale': scale,
            'rotation': 0,
            'bbox': box[:4],
            'bbox_score': score,
            'dataset': self.dataset_name,
            'joints_3d': joints_3d,
            'joints_3d_visible': joints_3d_visible,
            'bbox_id': bbox_id
        })
        bbox_id = bbox_id + 1
    print(f'=> Total boxes after filter '
          f'low score@{self.det_bbox_thr}: {bbox_id}')
    return kpt_db

@deprecated_api_warning(name_dict=dict(outputs='results'))
def evaluate(self, results, res_folder=None, metric='mAP', **kwargs):
    """Evaluate coco keypoint results. The pose prediction results will be
    saved in ``${res_folder}/result_keypoints.json``.

    Note:
        - batch_size: N
        - num_keypoints: K
        - heatmap height: H
        - heatmap width: W

    Args:
        results (list[dict]): Testing results containing the following
            items:

            - preds (np.ndarray[N,K,3]): The first two dimensions are \
                coordinates, score is the third dimension of the array.
            - boxes (np.ndarray[N,6]): [center[0], center[1], scale[0], \
                scale[1],area, score]
            - image_paths (list[str]): For example, ['data/coco/val2017\
                /000000393226.jpg']
            - heatmap (np.ndarray[N, K, H, W]): model output heatmap
            - bbox_id (list(int)).
        res_folder (str, optional): The folder to save the testing
            results. If not specified, a temp folder will be created.
            Default: None.
        metric (str | list[str]): Metric to be performed. Defaults: 'mAP'.

    Returns:
        dict: Evaluation results for evaluation metric.
    """
    metrics = metric if isinstance(metric, list) else [metric]
    allowed_metrics = ['mAP']
    for metric in metrics:
        if metric not in allowed_metrics:
            raise KeyError(f'metric {metric} is not supported')

    if res_folder is not None:
        tmp_folder = None
        res_file = osp.join(res_folder, 'result_keypoints.json')
    else:
        tmp_folder = tempfile.TemporaryDirectory()
        res_file = osp.join(tmp_folder.name, 'result_keypoints.json')

    kpts = defaultdict(list)

    for result in results:
        preds = result['preds']
        boxes = result['boxes']
        image_paths = result['image_paths']
        bbox_ids = result['bbox_ids']

        batch_size = len(image_paths)
        for i in range(batch_size):
            image_id = self.name2id[image_paths[i][len(self.img_prefix):]]
            kpts[image_id].append({
                'keypoints': preds[i],
                'center': boxes[i][0:2],
                'scale': boxes[i][2:4],
                'area': boxes[i][4],
                'score': boxes[i][5],
                'image_id': image_id,
                'bbox_id': bbox_ids[i]
            })
    kpts = self._sort_and_unique_bboxes(kpts)

    # rescoring and oks nms
    num_joints = self.ann_info['num_joints']
    vis_thr = self.vis_thr
    oks_thr = self.oks_thr
    valid_kpts = []
    for image_id in kpts.keys():
        img_kpts = kpts[image_id]
        for n_p in img_kpts:
            box_score = n_p['score']
            kpt_score = 0
            valid_num = 0
            for n_jt in range(0, num_joints):
                t_s = n_p['keypoints'][n_jt][2]
                if t_s > vis_thr:
                    kpt_score = kpt_score + t_s
                    valid_num = valid_num + 1
            if valid_num != 0:
                kpt_score = kpt_score / valid_num
            # rescoring
            n_p['score'] = kpt_score * box_score

        if self.use_nms:
            nms = soft_oks_nms if self.soft_nms else oks_nms
            keep = nms(img_kpts, oks_thr, sigmas=self.sigmas)
            valid_kpts.append([img_kpts[_keep] for _keep in keep])
        else:
            valid_kpts.append(img_kpts)

    self._write_coco_keypoint_results(valid_kpts, res_file)

    info_str = self._do_python_keypoint_eval(res_file)
    name_value = OrderedDict(info_str)

    if tmp_folder is not None:
        tmp_folder.cleanup()

    return name_value

def _write_coco_keypoint_results(self, keypoints, res_file):
    """Write results into a json file."""
    data_pack = [{
        'cat_id': self._class_to_coco_ind[cls],
        'cls_ind': cls_ind,
        'cls': cls,
        'ann_type': 'keypoints',
        'keypoints': keypoints
    } for cls_ind, cls in enumerate(self.classes)
                 if not cls == '__background__']

    results = self._coco_keypoint_results_one_category_kernel(data_pack[0])

    with open(res_file, 'w') as f:
        json.dump(results, f, sort_keys=True, indent=4)

def _coco_keypoint_results_one_category_kernel(self, data_pack):
    """Get coco keypoint results."""
    cat_id = data_pack['cat_id']
    keypoints = data_pack['keypoints']
    cat_results = []

    for img_kpts in keypoints:
        if len(img_kpts) == 0:
            continue

        _key_points = np.array(
            [img_kpt['keypoints'] for img_kpt in img_kpts])
        key_points = _key_points.reshape(-1,
                                         self.ann_info['num_joints'] * 3)

        result = [{
            'image_id': img_kpt['image_id'],
            'category_id': cat_id,
            'keypoints': key_point.tolist(),
            'score': float(img_kpt['score']),
            'center': img_kpt['center'].tolist(),
            'scale': img_kpt['scale'].tolist()
        } for img_kpt, key_point in zip(img_kpts, key_points)]

        cat_results.extend(result)

    return cat_results

def _do_python_keypoint_eval(self, res_file):
    """Keypoint evaluation using COCOAPI."""
    coco_det = self.coco.loadRes(res_file)
    coco_eval = COCOeval(self.coco, coco_det, 'keypoints', self.sigmas)
    coco_eval.params.useSegm = None
    coco_eval.evaluate()
    coco_eval.accumulate()
    coco_eval.summarize()

    stats_names = [
        'AP', 'AP .5', 'AP .75', 'AP (M)', 'AP (L)', 'AR', 'AR .5',
        'AR .75', 'AR (M)', 'AR (L)'
    ]

    info_str = list(zip(stats_names, coco_eval.stats))

    return info_str

def _sort_and_unique_bboxes(self, kpts, key='bbox_id'):
    """sort kpts and remove the repeated ones."""
    for img_id, persons in kpts.items():
        num = len(persons)
        kpts[img_id] = sorted(kpts[img_id], key=lambda x: x[key])
        for i in range(num - 1, 0, -1):
            if kpts[img_id][i][key] == kpts[img_id][i - 1][key]:
                del kpts[img_id][i]

    return kpts
yiweike commented 2 years ago

i have trained 147 epoch yesterday, but AP(M) still is -1.

jin-s13 commented 2 years ago

It is normal that AP(M)=-1, M means medium size. This is because your dataset does have medium size objects.

yiweike commented 2 years ago

But in other datasets, AP(M) and AR(M) are not -1. For example, HRNetw48 in MSCOCO ,AP(M) = 0.723. Should i correct any code or annotations to get different value ?

jin-s13 commented 2 years ago

As I have explained, MSCOCO dataset contains medium sized person, but your dataset does not. Medium size means: the area size of the object is between [32 2, 96 2]. So you do not need to worry about it. That is normal.

yiweike commented 2 years ago

Thanks for your explanation ! The issue has been solved!