open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.83k stars 1.25k forks source link

Why the DEKR method model only works correctly in square input? #2891

Open seon-creator opened 10 months ago

seon-creator commented 10 months ago

Prerequisite

Environment

mmcv 2.0.1 mmdet 3.0.0 mmengine 0.8.4 mmpose 1.1.0 mmpretrain 1.0.2

Reproduces the problem - code sample

_base_ = ['../../../_base_/default_runtime.py']

# runtime
train_cfg = dict(max_epochs=200, val_interval=10)

# optimizer
optim_wrapper = dict(optimizer=dict(
    type='Adam',
    lr=1e-3,
))

# learning policy
param_scheduler = [
    dict(
        type='LinearLR', begin=0, end=500, start_factor=0.001,
        by_epoch=False),  # warm-up
    dict(
        type='MultiStepLR',
        begin=0,
        end=140,
        milestones=[90, 120],
        gamma=0.1,
        by_epoch=True)
]

# automatically scaling LR based on the actual training batch size
auto_scale_lr = dict(base_batch_size=80)

# hooks
default_hooks = dict(checkpoint=dict(save_best='coco/AP', rule='greater'))

# codec settings
codec = dict(
    type='SPR',
    input_size=(192, 256),
    heatmap_size=(48, 64),
    sigma=(4, 2),
    minimal_diagonal_length=32**0.5,
    generate_keypoint_heatmaps=True,
    decode_max_instances=30)

# model settings
model = dict(
    type='BottomupPoseEstimator',
    data_preprocessor=dict(
        type='PoseDataPreprocessor',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        bgr_to_rgb=True),
    backbone=dict(
        type='HRNet',
        in_channels=3,
        extra=dict(
            stage1=dict(
                num_modules=1,
                num_branches=1,
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_channels=(64, )),
            stage2=dict(
                num_modules=1,
                num_branches=2,
                block='BASIC',
                num_blocks=(4, 4),
                num_channels=(32, 64)),
            stage3=dict(
                num_modules=4,
                num_branches=3,
                block='BASIC',
                num_blocks=(4, 4, 4),
                num_channels=(32, 64, 128)),
            stage4=dict(
                num_modules=3,
                num_branches=4,
                block='BASIC',
                num_blocks=(4, 4, 4, 4),
                num_channels=(32, 64, 128, 256),
                multiscale_output=True)),
        init_cfg=dict(
            type='Pretrained',
            checkpoint='/data/home/seondeok/Project/acupoint/mmpose/configs/body_2d_keypoint/pretrain/dekr/dekr_hrnet-w32_8xb10-140e_coco-512x512_ac7c17bf-20221228.pth'),
    ),
    neck=dict(
        type='FeatureMapProcessor',
        concat=True,
    ),
    head=dict(
        type='DEKRHead',
        in_channels=480,
        num_keypoints=5,   # edit
        heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True),
        displacement_loss=dict(
            type='SoftWeightSmoothL1Loss',
            use_target_weight=True,
            supervise_empty=False,
            beta=1 / 9,
            loss_weight=0.002,
        ),
        decoder=codec,
        # rescore_cfg=dict(
        #     in_channels=74,
        #     norm_indexes=(5, 6),
        #     init_cfg=dict(
        #         type='Pretrained',
        #         checkpoint='https://download.openmmlab.com/mmpose/'
        #         'pretrain_models/kpt_rescore_coco-33d58c5c.pth')),
    ),
    test_cfg=dict(
        multiscale_test=False,
        flip_test=True,
        nms_dist_thr=0.05,
        shift_heatmap=True,
        align_corners=False))

# enable DDP training when rescore net is used
find_unused_parameters = True

# base dataset settings
dataset_type = 'CocoArm'
data_mode = 'topdown'
data_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/PK_Train_1462/'
test_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/PK_Test_374/'
annotation_root = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/json/PK_Train_1462.json'
annotation_root_val = '/data/home/seondeok/Project/acupoint/coco_dataset/arm/json/PK_Test_374.json'

# pipelines
train_pipeline = [
    dict(type='LoadImage'),
    dict(type='BottomupRandomAffine', input_size=codec['input_size']),
    dict(type='RandomFlip', direction='horizontal'),
    dict(type='GenerateTarget', encoder=codec),
    dict(type='BottomupGetHeatmapMask'),
    dict(type='PackPoseInputs'),
]
val_pipeline = [
    dict(type='LoadImage'),
    dict(
        type='BottomupResize',
        input_size=codec['input_size'],
        size_factor=32,
        resize_mode='expand'),
    dict(
        type='PackPoseInputs',
        meta_keys=('id', 'img_id', 'img_path', 'crowd_index', 'ori_shape',
                   'img_shape', 'input_size', 'input_center', 'input_scale',
                   'flip', 'flip_direction', 'flip_indices', 'raw_ann_info',
                   'skeleton_links'))
]

# data loaders
train_dataloader = dict(
    batch_size=10,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        data_mode=data_mode,
        ann_file=annotation_root,
        data_prefix=dict(img=data_root),
        pipeline=train_pipeline,
    ))
val_dataloader = dict(
    batch_size=1,
    num_workers=1,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
    dataset=dict(
        type=dataset_type,
        data_root=test_root,
        data_mode=data_mode,
        ann_file=annotation_root_val,
        data_prefix=dict(img=test_root),
        test_mode=True,
        pipeline=val_pipeline,
    ))
test_dataloader = val_dataloader

# evaluators
val_evaluator = dict(
    type='CocoMetric',
    ann_file=annotation_root_val,
    nms_mode='none',
    score_mode='keypoint',
)
test_evaluator = val_evaluator

Reproduces the problem - command or script

python tools/train.py configs/body_2d_keypoint/dekr/custom_coco/dekr_hrnet-w32_8xb10-140e_coco-256x192.py

Reproduces the problem - error message

The train.py code works with not error, but after finishing train when I use the weight to infer, the performance was worst. Compare to square input, the keypoint detection didn't work good. When I use this model with default input size 256x256, this model works well, but I have a question. Why the DEKR, bottomupPoseEstimation method didn't work well with rectangle input size? Unlike Top down models, why bottomupPoseEstimation model didn't works well with 256x192 or other rectangular resize?

Additional information

No response

Ben-Louis commented 10 months ago

The test-time transform called BottomupResize cannot guarantee that the input image will be square. Instead, it typically resizes the image so that its shortest edge is the same length as the specified width and height of input_size. If the width and height of input_size are different, it could cause problems in this process based on the following code snippet. https://github.com/open-mmlab/mmpose/blob/efe09cd5268d4d6b21100334fbf2947ef36fc7db/mmpose/datasets/transforms/bottomup_transforms.py#L521-L530

seon-creator commented 10 months ago

Thank you for your answer!

wusaisa commented 10 months ago

The test-time transform called BottomupResize cannot guarantee that the input image will be square. Instead, it typically resizes the image so that its shortest edge is the same length as the specified width and height of input_size. If the width and height of input_size are different, it could cause problems in this process based on the following code snippet.

https://github.com/open-mmlab/mmpose/blob/efe09cd5268d4d6b21100334fbf2947ef36fc7db/mmpose/datasets/transforms/bottomup_transforms.py#L521-L530

So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.

Ben-Louis commented 10 months ago

So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.

You can try to set the input_size to (1080, 1080). During training, the image will be randomly resized and cropped to 1080x1080. During inference, the image will be resized to 1920x1080.

wusaisa commented 10 months ago

So, is there any way to change which part of the code makes it possible to get good test results even with rectangular input sizes? My custom dataset is 1920×1080 resolution and the test results for the DEKR model are poor.

You can try to set the input_size to (1080, 1080). During training, the image will be randomly resized and cropped to 1080x1080. During inference, the image will be resized to 1920x1080.

Thank you very much! I will try it.