[Bug] DEKR model trained on custom dataset results in poor performance

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).

Environment

python -c "from mmpose.utils import collect_env; print(collect_env())" OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:21:28) [GCC 12.3.0]'), ('CUDA available', True), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA TITAN X (Pascal)'), ('CUDA_HOME', '/usr/local/cuda-11.8'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 8.4.0-1ubuntu1~18.04) 8.4.0'), ('PyTorch', '2.0.1'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2023.2-Product Build 20230613 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.8\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.7\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.2'), ('MMPose', '1.3.1+5a3be94')])

pip list | grep mm comm 0.2.1 diagnostic_common_diagnostics 1.9.7 mmcv 2.1.0 mmdeploy 1.3.1 /home/lmga-titanx/openmmlab/mmdeploy mmdeploy-runtime 1.3.1 mmdeploy-runtime-gpu 1.3.1 mmdet 3.2.0 mmengine 0.10.2 mmpose 1.3.1 /home/lmga-titanx/openmmlab/mmpose qt-gui-py-common 0.4.2 rqt_py_common 0.5.3

Reproduces the problem - code sample

My config file:

auto_scale_lr = dict(base_batch_size=10)
backend_args = dict(backend='local')
codec = dict(
    decode_max_instances=30,
    generate_keypoint_heatmaps=True,
    heatmap_size=(
        24,
        24,
    ),
    input_size=(
        96,
        96,
    ),
    minimal_diagonal_length=5.656854249492381,
    sigma=(
        4,
        2,
    ),
    type='SPR')
custom_hooks = [
    dict(type='SyncBuffersHook'),
]
data_mode = 'bottomup'
data_root = '/home/lmga-titanx/mmpose/data/testing_set/'
dataset_type = 'CocoDataset'
default_hooks = dict(
    badcase=dict(
        badcase_thr=5,
        enable=False,
        metric_type='loss',
        out_dir='badcase',
        type='BadCaseAnalysisHook'),
    checkpoint=dict(
        interval=10,
        rule='greater',
        save_best='coco/AP',
        type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(enable=False, type='PoseVisualizationHook'))
default_scope = 'mmpose'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
find_unused_parameters = True
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(
    by_epoch=True, num_digits=6, type='LogProcessor', window_size=50)
model = dict(
    backbone=dict(
        extra=dict(
            stage1=dict(
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_branches=1,
                num_channels=(64, ),
                num_modules=1),
            stage2=dict(
                block='BASIC',
                num_blocks=(
                    4,
                    4,
                ),
                num_branches=2,
                num_channels=(
                    32,
                    64,
                ),
                num_modules=1),
            stage3=dict(
                block='BASIC',
                num_blocks=(
                    4,
                    4,
                    4,
                ),
                num_branches=3,
                num_channels=(
                    32,
                    64,
                    128,
                ),
                num_modules=4),
            stage4=dict(
                block='BASIC',
                multiscale_output=True,
                num_blocks=(
                    4,
                    4,
                    4,
                    4,
                ),
                num_branches=4,
                num_channels=(
                    32,
                    64,
                    128,
                    256,
                ),
                num_modules=3)),
        in_channels=3,
        init_cfg=dict(
            checkpoint=
            'https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w32-36af842e.pth',
            type='Pretrained'),
        type='HRNet'),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='PoseDataPreprocessor'),
    head=dict(
        decoder=dict(
            decode_max_instances=30,
            generate_keypoint_heatmaps=True,
            heatmap_size=(
                24,
                24,
            ),
            input_size=(
                96,
                96,
            ),
            minimal_diagonal_length=5.656854249492381,
            sigma=(
                4,
                2,
            ),
            type='SPR'),
        displacement_loss=dict(
            beta=0.1111111111111111,
            loss_weight=0.002,
            supervise_empty=False,
            type='SoftWeightSmoothL1Loss',
            use_target_weight=True),
        heatmap_loss=dict(type='KeypointMSELoss', use_target_weight=True),
        in_channels=480,
        num_keypoints=2,
        type='DEKRHead'),
    neck=dict(concat=True, type='FeatureMapProcessor'),
    test_cfg=dict(
        align_corners=False,
        flip_test=True,
        multiscale_test=False,
        nms_dist_thr=0.05,
        shift_heatmap=True),
    type='BottomupPoseEstimator')
optim_wrapper = dict(optimizer=dict(lr=0.001, type='Adam'))
param_scheduler = [
    dict(
        begin=0, by_epoch=False, end=500, start_factor=0.001, type='LinearLR'),
    dict(
        begin=0,
        by_epoch=True,
        end=300,
        gamma=0.1,
        milestones=[
            200,
            260,
        ],
        type='MultiStepLR'),
]
resume = False
test_cfg = dict()
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/person_keypoints_valid.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(
                input_size=(
                    96,
                    96,
                ),
                resize_mode='expand',
                size_factor=32,
                type='BottomupResize'),
            dict(
                meta_keys=(
                    'id',
                    'img_id',
                    'img_path',
                    'crowd_index',
                    'ori_shape',
                    'img_shape',
                    'input_size',
                    'input_center',
                    'input_scale',
                    'flip',
                    'flip_direction',
                    'flip_indices',
                    'raw_ann_info',
                    'skeleton_links',
                ),
                type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
test_evaluator = dict(
    ann_file=
    '/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
    nms_mode='none',
    score_mode='keypoint',
    type='CocoMetric')
train_cfg = dict(by_epoch=True, max_epochs=300, val_interval=20)
train_dataloader = dict(
    batch_size=10,
    dataset=dict(
        ann_file='annotations/person_keypoints_train.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(input_size=(
                96,
                96,
            ), type='BottomupRandomAffine'),
            dict(direction='horizontal', type='RandomFlip'),
            dict(
                encoder=dict(
                    decode_max_instances=30,
                    generate_keypoint_heatmaps=True,
                    heatmap_size=(
                        24,
                        24,
                    ),
                    input_size=(
                        96,
                        96,
                    ),
                    minimal_diagonal_length=5.656854249492381,
                    sigma=(
                        4,
                        2,
                    ),
                    type='SPR'),
                type='GenerateTarget'),
            dict(type='PackPoseInputs'),
        ],
        type='CocoDataset'),
    num_workers=2,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(type='LoadImage'),
    dict(input_size=(
        96,
        96,
    ), type='BottomupRandomAffine'),
    dict(direction='horizontal', type='RandomFlip'),
    dict(
        encoder=dict(
            decode_max_instances=30,
            generate_keypoint_heatmaps=True,
            heatmap_size=(
                24,
                24,
            ),
            input_size=(
                96,
                96,
            ),
            minimal_diagonal_length=5.656854249492381,
            sigma=(
                4,
                2,
            ),
            type='SPR'),
        type='GenerateTarget'),
    dict(type='PackPoseInputs'),
]
val_cfg = dict()
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='annotations/person_keypoints_valid.json',
        data_mode='bottomup',
        data_prefix=dict(img='images/'),
        data_root='/home/lmga-titanx/mmpose/data/testing_set/',
        metainfo=dict(
            from_file=
            '/home/lmga-titanx/openmmlab/mmpose/configs/_base_/datasets/custom_2.py'
        ),
        pipeline=[
            dict(type='LoadImage'),
            dict(
                input_size=(
                    96,
                    96,
                ),
                resize_mode='expand',
                size_factor=32,
                type='BottomupResize'),
            dict(
                meta_keys=(
                    'id',
                    'img_id',
                    'img_path',
                    'crowd_index',
                    'ori_shape',
                    'img_shape',
                    'input_size',
                    'input_center',
                    'input_scale',
                    'flip',
                    'flip_direction',
                    'flip_indices',
                    'raw_ann_info',
                    'skeleton_links',
                ),
                type='PackPoseInputs'),
        ],
        test_mode=True,
        type='CocoDataset'),
    drop_last=False,
    num_workers=1,
    persistent_workers=True,
    sampler=dict(round_up=False, shuffle=False, type='DefaultSampler'))
val_evaluator = dict(
    ann_file=
    '/home/lmga-titanx/mmpose/data/testing_set/annotations/person_keypoints_valid.json',
    nms_mode='none',
    score_mode='keypoint',
    type='CocoMetric')
val_pipeline = [
    dict(type='LoadImage'),
    dict(
        input_size=(
            96,
            96,
        ),
        resize_mode='expand',
        size_factor=32,
        type='BottomupResize'),
    dict(
        meta_keys=(
            'id',
            'img_id',
            'img_path',
            'crowd_index',
            'ori_shape',
            'img_shape',
            'input_size',
            'input_center',
            'input_scale',
            'flip',
            'flip_direction',
            'flip_indices',
            'raw_ann_info',
            'skeleton_links',
        ),
        type='PackPoseInputs'),
]
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(type='TensorboardVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='PoseLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
        dict(type='TensorboardVisBackend'),
    ])
work_dir = './work_dirs/dekr_hrnet-w32_8xb10-140e_coco-512x512'

mmpose/configs/base/datasets/custom_2.py:

dataset_info = dict(
    dataset_name='apple_calyx_coco',
    paper_info=dict(
        author='Lin, Tsung-Yi and Maire, Michael and '
        'Belongie, Serge and Hays, James and '
        'Perona, Pietro and Ramanan, Deva and '
        r'Doll{\'a}r, Piotr and Zitnick, C Lawrence',
        title='Microsoft coco: Common objects in context',
        container='European conference on computer vision',
        year='2014',
        homepage='http://cocodataset.org/',
    ),
    keypoint_info={
        0:
        dict(name='calyx', id=0, color=[0,0,255], swap=''),
        1:
        dict(name='stem', id=1, color=[255,0,0], swap='')
    },
    flip_pairs = [0,1],
    flip_index = [0,1],
    skeleton_info={},
    joint_weights=[1.] * 2,
    sigmas=[0.2, 0.2])

Reproduces the problem - command or script

python tools/train.py /home/lmga-titanx/openmmlab/mmpose/configs/apple/DEKR/coco/dekr_hrnet-w32_8xb10-140e_coco-512x512.py

Reproduces the problem - error message

Screenshot from 2024-01-18 15-19-01

Screenshot from 2024-01-18 15-18-41

Additional information

Goal: I'm trying to predict 2 keypoints with two classes(meaning one keypoint for each classes) on one instance of object in a single image.

Problem: The AP and AR of the model decreases while the loss decreases, which doesn''t seem right. No key points were predicted with the latest checkpoint. The best checkpoint (300 epoch) was able to predict some keypoints, with very low keypoint scores (keypoint_scores: array([[ 0.01671034, -0.00024851]]).

Extra information: The dataset shouldn't be the problem as the same data was used to trained a Bottom-up Associative Embedding model with the previous 0.x version and the results were good.

open-mmlab / mmpose