open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.59k stars 1.22k forks source link

Top-down have lower precision than bottom up #736

Closed greg-is-kub closed 3 years ago

greg-is-kub commented 3 years ago

hi !

I'm opening an issue because I have a problem with the fact that every top-down methods that have a lower precision than bottom up on my own dataset . While usually you should get better results with top-down-methods, I get better results on Bottom-up method.

Here are the result of various tests I did with various datasets : https://docs.google.com/spreadsheets/d/1VwA9OIKHJP8EzJRWCUb1TzbbaCG-GnJHqsd37OPfjnQ/edit#gid=0

Every specified model used are the original config file where i just change data and data_cfg dictionnaries to match the data , annotation and bounding box file path.

I used the coco annotator tool to create my annotations. I take the json file that will be my annotation. Then make a copy of annotation_json["annotation"] that will be my the box detection result ( + a false 1.0 box confidence score i add bc it is need by the code) .

{"id": 387, "image_id": 211, "category_id": 1, "dataset_id": 15, "segmentation": [[20.9, 256, 20.9, -0.6, 171.8, -0.6, 171.8, 256]], "area": 38656, "bbox": [21, 0, 151, 256], "iscrowd": false, "isbbox": true, "creator": "greg_is_greg", "width": 192, "height": 256, "color": "#4085ec", "keypoints": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 48, 2, 89, 34, 2, 152, 130, 2, 72, 123, 2, 153, 200, 2, 51, 186, 2, 122, 192, 2, 89, 188, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "metadata": {}, "milliseconds": 41217, "events": [{"_cls": "SessionEvent", "created_at": {"$date": 1623937046948}, "user": "greg_is_greg", "milliseconds": 22771, "tools_used": ["BBox", "Keypoints"]}, {"_cls": "SessionEvent", "created_at": {"$date": 1623938782823}, "user": "greg_is_greg", "milliseconds": 9223, "tools_used": ["BBox"]}], "num_keypoints": 8, "score": 1.0}

here is what medium office dataset typical data looks like (with visible annotations:
image

here is an example for medium hospital dataset : image

Regarding the challenging images with a lot of occlusion in the hospital dataset i kinda expected my results to be low but the fact that they are lower in TopDown detector than in BottomUp for Hospital AND Office dataset makes me think I did wrong somewhere.

Do you have any idea of where I might have made a mistake ?

Thanks in advance for your help.

ly015 commented 3 years ago

Hi, I think maybe you could visualize some of the top-down results, especially those with large errors. It may help to locate the problem.

jin-s13 commented 3 years ago

What is the size of your dataset (number of images)?

jin-s13 commented 3 years ago

If your dataset is relatively small (only a few hundred), some modifications are needed.

  1. the warmup iteration should be smaller. https://github.com/open-mmlab/mmpose/blob/202983d24665a909ae1c45f4025d66794b9e32fd/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192.py#L18

  2. increase the number of total epochs (and accordingly increase the lr step). https://github.com/open-mmlab/mmpose/blob/master/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192.py#L20-L21

  3. It may help to use COCO-pretrained model to initialize the model. Replace 'None' with the model link. https://github.com/open-mmlab/mmpose/blob/202983d24665a909ae1c45f4025d66794b9e32fd/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192.py#L2

greg-is-kub commented 3 years ago

@jin-s13 I used pretrained models (dowloaded from mmpose read the docs) without further training on a VERY small test database, meaning 20 pictures maximum . I specify the checkpoint file path in the command line , does it makes a difference ? ./tools/dist_test.sh configs/my_config/medium_office_dataset/higher_hrnet48_coco_512x512.py checkpoints/a_tester/higher_hrnet48_coco_512x512_ae-60fedcbc_20200712.pth 1 --out benchmark_result/medium_office_dataset/higher_hrnet_w48_512x512.json

EDIT : I did it your way by using and obtained the same results.

python tools/test.py configs/my_config/medium_hospital_dataset/res152_coco_256x192_dark.py https://download.openmmlab.com/mmpose/top_down/resnet/res152_coco_256x192_dark-ab4840d5_20200812.pth --out benchmark_result/medium_hospital_dataset/resnet152_dark_256x192.json

Here is the config file i used

log_level = 'INFO'
load_from = 'https://download.openmmlab.com/mmpose/top_down/resnet/res152_coco_256x192_dark-ab4840d5_20200812.pth'
resume_from = None
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=10)
evaluation = dict(interval=10, metric='mAP', key_indicator='AP')

optimizer = dict(
    type='Adam',
    lr=5e-4,
)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[170, 200])
total_epochs = 210
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])

channel_cfg = dict(
    num_output_channels=17,
    dataset_joints=17,
    dataset_channel=[
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    ],
    inference_channel=[
        0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
    ])

# model settings
model = dict(
    type='TopDown',
    pretrained='torchvision://resnet152',
    backbone=dict(type='ResNet', depth=152),
    keypoint_head=dict(
        type='TopDownSimpleHead',
        in_channels=2048,
        out_channels=channel_cfg['num_output_channels'],
        loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)),
    train_cfg=dict(),
    test_cfg=dict(
        flip_test=True,
        post_process='unbiased',
        shift_heatmap=True,
        modulate_kernel=11))

data_cfg = dict(
    image_size=[192, 256],
    heatmap_size=[48, 64],
    num_output_channels=channel_cfg['num_output_channels'],
    num_joints=channel_cfg['dataset_joints'],
    dataset_channel=channel_cfg['dataset_channel'],
    inference_channel=channel_cfg['inference_channel'],
    soft_nms=False,
    nms_thr=1.0,
    oks_thr=0.9,
    vis_thr=0.2,
    use_gt_bbox=False,
    det_bbox_thr=0.0,
    bbox_file='data/custom/medium_hospital_dataset/256x192/detection_result/'
    'medium_hospital_dataset_256x192.json',
)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownRandomFlip', flip_prob=0.5),
    dict(
        type='TopDownHalfBodyTransform',
        num_joints_half_body=8,
        prob_half_body=0.3),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
    dict(type='TopDownAffine'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTarget', sigma=2, unbiased_encoding=True),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs'
        ]),
]

val_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownAffine'),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(
        type='Collect',
        keys=['img'],
        meta_keys=[
            'image_file', 'center', 'scale', 'rotation', 'bbox_score',
            'flip_pairs'
        ]),
]

test_pipeline = val_pipeline

data_root = 'data/custom/medium_hospital_dataset/256x192'
data = dict(
    samples_per_gpu=32,
    workers_per_gpu=4,
    train=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotation/medium_hospital_dataset_256x192.json',
        img_prefix=f'{data_root}/data/',
        data_cfg=data_cfg,
        pipeline=train_pipeline),
    val=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotation/medium_hospital_dataset_256x192.json',
        img_prefix=f'{data_root}/data/',
        data_cfg=data_cfg,
        pipeline=val_pipeline),
    test=dict(
        type='TopDownCocoDataset',
        ann_file=f'{data_root}/annotation/medium_hospital_dataset_256x192.json',
        img_prefix=f'{data_root}/data/',
        data_cfg=data_cfg,
        pipeline=val_pipeline),
)

EDIT : I applied a rotation of 1 degree and then 10 deg to my test datawithout changing the annotation and tests showed relatively better results (AP of .38 on hospital dataset w/ darkpose res152 256x192 , it seems that the problem comes from my dataset

greg-is-kub commented 3 years ago

@ly015 Most of the errors come from the fact that there are a lot of artifact on the pictures, folded blankets or intubation devices are mistaken as joints or limbs. I have no clue of how I could remove them.

The whole point of the project is to make it work with these constraints.

greg-is-kub commented 3 years ago

The error came from the fact that I was resizing the picture but you already had preprocessing in your pipeline.

With the original images the error were corrected and i now have better accuracy