open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.65k stars 1.22k forks source link

[Bug] DEKR model trained on custom dataset results in poor performance #2931

Open EugeneKok97 opened 8 months ago

EugeneKok97 commented 8 months ago

Prerequisite

Environment

python -c "from mmpose.utils import collect_env; print(collect_env())" OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28) [GCC 12.3.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA TITAN X (Pascal)'), ('CUDA_HOME', '/usr/local/cuda-11.8'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0'), ('PyTorch', '2.0.1'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.2'), ('MMPose', '1.3.1+5a3be94')])

pip list | grep mm comm 0.2.1 diagnostic_common_diagnostics 1.9.7 mmcv 2.1.0 mmdeploy 1.3.1 /home/lmga-titanx/openmmlab/mmdeploy mmdeploy-runtime 1.3.1 mmdeploy-runtime-gpu 1.3.1 mmdet 3.2.0 mmengine 0.10.2 mmpose 1.3.1 /home/lmga-titanx/openmmlab/mmpose qt-gui-py-common 0.4.2 rqt_py_common 0.5.3

Reproduces the problem - code sample

My config file:

auto_scale_lr = dict(base_batch_size=10)
backend_args = dict(backend='local')
codec = dict(
    decode_max_instances=30,
    generate_keypoint_heatmaps=True,
    heatmap_size=(
        24,
        24,
    ),
    input_size=(
        96,
        96,
    ),
    minimal_diagonal_length=5.656854249492381,
    sigma=(
        4,
        2,
    ),
    type='SPR')
custom_hooks = [
    dict(type='SyncBuffersHook'),
]
data_mode = 'bottomup'
data_root = '/home/lmga-titanx/mmpose/data/testing_set/'
dataset_type = 'CocoDataset'
default_hooks = dict(
    badcase=dict(
        badcase_thr=5,
        enable=False,
        metric_type='loss',
        out_dir='badcase',
        type='BadCaseAnalysisHook'),
    checkpoint=dict(
        interval=10,
        rule='greater',
        save_best='coco/AP',
        type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(enable=False, type='PoseVisualizationHook'))
default_scope = 'mmpose'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
find_unused_parameters = True
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(
    by_epoch=True, num_digits=6, type='LogProcessor', window_size=50)
model = dict(
    backbone=dict(
        extra=dict(
            stage1=dict(
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_branches=1,
                num_channels=(64, ),
                num_modules=1),
            stage2=dict(
                block='BASIC',
                num_blocks=(
                    4,
                    4,
                ),
                num_branches=2,
                num_channels=(
                    32,
                    64,
                ),
                num_modules=1),
            stage3=dict(
                block='BASIC',
                num_blocks=(
                    4,
                    4,
                    4,
                ),
                num_branches=3,
                num_channels=(
                    32,
                    64,
                    128,
                ),
                num_modules=4),
            stage4=dict(
                block='BASIC',
                multiscale_output=True,
                num_blocks=(
                    4,
                    4,
                    4,
                    4,
                ),
                num_branches=4,
                num_channels=(
                    32,
                    64,
                    128,
                    256,
                ),
                num_modules=3)),
        in_channels=3,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w32-36af842e.pth',
            type='Pretrained'),
        type='HRNet'),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='PoseDataPreprocessor'),
    head=dict(
        decoder=dict(
            decode_max_instances=30,
            generate_keypoint_heatmaps=True,
            heatmap_size=(
                24,
                24,
            ),
            input_size=(
                96,
                96,
            ),
            minimal_diagonal_length=5.656854249492381,
            sigma=(
                4,
                2,
            ),
            type='SPR'),
        displacement_loss=dict(
            beta=0.1111111111111111,
            loss_weight=0.002,
            supervise_empty=False,
            type='SoftWeightSmoothL1Loss',
            use_target_weight=True),
        heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True),
        in_channels=480,
        num_keypoints=2,
        type='DEKRHead'),
    neck=dict(concat=True, type='FeatureMapProcessor'),
    test_cfg=dict(
        align_corners=False,
        flip_test=True,
        multiscale_test=False,
        nms_dist_thr=0.05,
        shift_heatmap=True),
    type='BottomupPoseEstimator')
optim_wrapper = dict(optimizer=dict(lr=0.001, type='Adam'))
param_scheduler = [
    dict(
        begin=0, by_epoch=False, end=500, start_factor=0.001, type='LinearLR'),
    dict(
        begin=0,
        by_epoch=True,
        end=300,
        gamma=0.1,
        milestones=[
            200,
            260,
        ],
        type='MultiStepLR'),
]
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/person_keypoints_valid.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(
                input_size=(
                    96,
                    96,
                ),
                resize_mode='expand',
                size_factor=32,
                type='BottomupResize'),
            dict(
                meta_keys=(
                    'id',
                    'img_id',
                    'img_path',
                    'crowd_index',
                    'ori_shape',
                    'img_shape',
                    'input_size',
                    'input_center',
                    'input_scale',
                    'flip',
                    'flip_direction',
                    'flip_indices',
                    'raw_ann_info',
                    'skeleton_links',
                ),
                type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    ann_file=
    '/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
    nms_mode='none',
    score_mode='keypoint',
    type='CocoMetric')
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=20)
train_dataloader = dict(
    batch_size=10,
    dataset=dict(
        ann_file='annotations/person_keypoints_train.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(input_size=(
                96,
                96,
            ), type='BottomupRandomAffine'),
            dict(direction='horizontal', type='RandomFlip'),
            dict(
                encoder=dict(
                    decode_max_instances=30,
                    generate_keypoint_heatmaps=True,
                    heatmap_size=(
                        24,
                        24,
                    ),
                    input_size=(
                        96,
                        96,
                    ),
                    minimal_diagonal_length=5.656854249492381,
                    sigma=(
                        4,
                        2,
                    ),
                    type='SPR'),
                type='GenerateTarget'),
            dict(type='PackPoseInputs'),
        ],
        type='CocoDataset'),
    num_workers=2,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(type='LoadImage'),
    dict(input_size=(
        96,
        96,
    ), type='BottomupRandomAffine'),
    dict(direction='horizontal', type='RandomFlip'),
    dict(
        encoder=dict(
            decode_max_instances=30,
            generate_keypoint_heatmaps=True,
            heatmap_size=(
                24,
                24,
            ),
            input_size=(
                96,
                96,
            ),
            minimal_diagonal_length=5.656854249492381,
            sigma=(
                4,
                2,
            ),
            type='SPR'),
        type='GenerateTarget'),
    dict(type='PackPoseInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/person_keypoints_valid.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(
                input_size=(
                    96,
                    96,
                ),
                resize_mode='expand',
                size_factor=32,
                type='BottomupResize'),
            dict(
                meta_keys=(
                    'id',
                    'img_id',
                    'img_path',
                    'crowd_index',
                    'ori_shape',
                    'img_shape',
                    'input_size',
                    'input_center',
                    'input_scale',
                    'flip',
                    'flip_direction',
                    'flip_indices',
                    'raw_ann_info',
                    'skeleton_links',
                ),
                type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    ann_file=
    '/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
    nms_mode='none',
    score_mode='keypoint',
    type='CocoMetric')
val_pipeline = [
    dict(type='LoadImage'),
    dict(
        input_size=(
            96,
            96,
        ),
        resize_mode='expand',
        size_factor=32,
        type='BottomupResize'),
    dict(
        meta_keys=(
            'id',
            'img_id',
            'img_path',
            'crowd_index',
            'ori_shape',
            'img_shape',
            'input_size',
            'input_center',
            'input_scale',
            'flip',
            'flip_direction',
            'flip_indices',
            'raw_ann_info',
            'skeleton_links',
        ),
        type='PackPoseInputs'),
]
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='PoseLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
        dict(type='TensorboardVisBackend'),
    ])
work_dir = './work_dirs/dekr_hrnet-w32_8xb10-140e_coco-512x512'

mmpose/configs/base/datasets/custom_2.py:

dataset_info = dict(
    dataset_name='apple_calyx_coco',
    paper_info=dict(
        author='Lin, Tsung-Yi and Maire, Michael and '
        'Belongie, Serge and Hays, James and '
        'Perona, Pietro and Ramanan, Deva and '
        r'Doll{\'a}r, Piotr and Zitnick, C Lawrence',
        title='Microsoft coco: Common objects in context',
        container='European conference on computer vision',
        year='2014',
        homepage='http://cocodataset.org/',
    ),
    keypoint_info={
        0:
        dict(name='calyx', id=0, color=[0,0,255], swap=''),
        1:
        dict(name='stem', id=1, color=[255,0,0], swap='')
    },
    flip_pairs = [0,1],
    flip_index = [0,1],
    skeleton_info={},
    joint_weights=[1.] * 2,
    sigmas=[0.2, 0.2])

Reproduces the problem - command or script

python tools/train.py /home/lmga-titanx/openmmlab/mmpose/configs/apple/DEKR/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py

Reproduces the problem - error message

Screenshot from 2024-01-18 15-19-01

Screenshot from 2024-01-18 15-18-41

Additional information

Goal: I'm trying to predict 2 keypoints with two classes(meaning one keypoint for each classes) on one instance of object in a single image.

Problem: The AP and AR of the model decreases while the loss decreases, which doesn''t seem right. No key points were predicted with the latest checkpoint. The best checkpoint (300 epoch) was able to predict some keypoints, with very low keypoint scores (keypoint_scores: array([[ 0.01671034, -0.00024851]]).

Extra information: The dataset shouldn't be the problem as the same data was used to trained a Bottom-up Associative Embedding model with the previous 0.x version and the results were good.

Ben-Louis commented 8 months ago

It might be helpful to use dataset browser to visualize the data and check if the annotation and label are reasonable. The heatmap size of 24 might be too small for sigma 2 and 4.