Bottom Up inference detecting extra keypoints

Chttan commented 2 years ago

Greetings,

First, thank you for your efforts and impressive project.

I am trying to train a bottom-up model to detect a single animal using a custom dataset. I have around 600 annotated images, with 3 keypoints and bounding boxes. In the images, there are typically 2 animals, one with explicit markings, one without.

I am training the model for 400 Epochs and inferring using ./demo/bottom_up_video_demo.py

Looking at the output video, the correct animal is being tracked, with the 3 keypoints generally near the desired locations. However, there are many frames where there are more than 3 keypoints being detected and drawn, sometimes up to a total of 6. The additional keypoints are usually adjacent to the existing keypoints. I am having trouble understanding what might cause this and what changes I should make when training to resolve this issue. Could you offer some help please?

Please find my environment, collected using ./utils/collect_env.py:

Python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla K80
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.1.TC455_06.29190527_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu111
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.11.1+cu111
OpenCV: 4.5.4
MMCV: 1.3.17
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.1
MMPose: 0.20.0+c0d5e4c

I train using ./tools/train.py. I am unable to provide complete training output logs, as for some reason the output does not show epoch progression:

2021-12-03 04:33:10,193 - mmpose - INFO - Config:
dataset_info = dict(
    dataset_name='animal_A',
    paper_info=dict(
        author='',
        title='',
        container='',
        year='',
        homepage=''),
    keypoint_info=dict({
        0:
        dict(name='head', id=0, color=[51, 153, 255], type='upper', swap=''),
        1:
        dict(
            name='shoulder', id=1, color=[51, 153, 255], type='upper',
            swap=''),
        2:
        dict(name='dock', id=2, color=[51, 153, 255], type='upper', swap='')
    }),
    skeleton_info=dict({
        0:
        dict(link=('head', 'shoulder'), id=0, color=[0, 255, 0]),
        1:
        dict(link=('shoulder', 'dock'), id=1, color=[0, 255, 0]),
        2:
        dict(link=('dock', 'head'), id=2, color=[0, 255, 0])
    }),
    joint_weights=[1.0, 1.0, 1.0],
    sigmas=[0.025, 0.025, 0.025])
log_level = 'INFO'
load_from = None
resume_from = './work_dirs/animal_hrnet_w32_bottomUp_model5/epoch_100.pth'
dist_params = dict(backend='nccl')
workflow = [('train', 1)]
checkpoint_config = dict(interval=50)
evaluation = dict(interval=50, metric='mAP', key_indicator='AP')
optimizer = dict(type='Adam', lr=0.0025)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=250,
    warmup_ratio=0.001,
    step=[200, 260])
total_epochs = 400
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
channel_cfg = dict(
    num_output_channels=3,
    dataset_joints=3,
    dataset_channel=[[0, 1, 2]],
    inference_channel=[0, 1, 2])
data_cfg = dict(
    image_size=256,
    base_size=64,
    base_sigma=2,
    heatmap_size=[64, 64],
    num_output_channels=3,
    num_joints=3,
    dataset_channel=[[0, 1, 2]],
    inference_channel=[0, 1, 2],
    num_scales=2,
    scale_aware_sigma=False)
model = dict(
    type='AssociativeEmbedding',
    pretrained=
    'https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w32-36af842e.pth',
    backbone=dict(
        type='HRNet',
        in_channels=3,
        extra=dict(
            stage1=dict(
                num_modules=1,
                num_branches=1,
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_channels=(64, )),
            stage2=dict(
                num_modules=1,
                num_branches=2,
                block='BASIC',
                num_blocks=(4, 4),
                num_channels=(32, 64)),
            stage3=dict(
                num_modules=4,
                num_branches=3,
                block='BASIC',
                num_blocks=(4, 4, 4),
                num_channels=(32, 64, 128)),
            stage4=dict(
                num_modules=3,
                num_branches=4,
                block='BASIC',
                num_blocks=(4, 4, 4, 4),
                num_channels=(32, 64, 128, 256)))),
    keypoint_head=dict(
        type='AEHigherResolutionHead',
        in_channels=32,
        num_joints=3,
        tag_per_joint=True,
        extra=dict(final_conv_kernel=1),
        num_deconv_layers=1,
        num_deconv_filters=[32],
        num_deconv_kernels=[4],
        num_basic_blocks=4,
        cat_output=[True],
        with_ae_loss=[True, False],
        loss_keypoint=dict(
            type='MultiLossFactory',
            num_joints=3,
            num_stages=2,
            ae_loss_type='exp',
            with_ae_loss=[True, False],
            push_loss_factor=[0.01, 0.01],
            pull_loss_factor=[0.01, 0.01],
            with_heatmaps_loss=[True, False],
            heatmaps_loss_factor=[1.0, 1.0],
            supervise_empty=False)),
    train_cfg=dict(num_joints=3, img_size=256),
    test_cfg=dict(
        num_joints=3,
        max_num_people=2,
        scale_factor=[1],
        with_heatmaps=[True, True],
        with_ae=[True, False],
        project2image=True,
        align_corners=False,
        nms_kernel=5,
        nms_padding=2,
        tag_per_joint=True,
        detection_threshold=0.1,
        tag_threshold=1,
        use_detection_val=True,
        ignore_too_much=False,
        adjust=True,
        refine=True,
        flip_test=True))
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='BottomUpRandomAffine',
        rot_factor=30,
        scale_factor=[0.75, 1.5],
        scale_type='short',
        trans_factor=40),
    dict(type='BottomUpRandomFlip', flip_prob=0.5),
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='BottomUpGenerateTarget', sigma=2, max_num_people=2),
    dict(
        type='Collect',
        keys=['img', 'joints', 'targets', 'masks'],
        meta_keys=[])
]
val_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
    dict(
        type='BottomUpResizeAlign',
        transforms=[
            dict(type='ToTensor'),
            dict(
                type='NormalizeTensor',
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225])
        ]),
    dict(
        type='Collect',
        keys=['img'],
        meta_keys=[
            'image_file', 'aug_data', 'test_scale_factor', 'base_size',
            'center', 'scale', 'flip_index'
        ])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
    dict(
        type='BottomUpResizeAlign',
        transforms=[
            dict(type='ToTensor'),
            dict(
                type='NormalizeTensor',
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225])
        ]),
    dict(
        type='Collect',
        keys=['img'],
        meta_keys=[
            'image_file', 'aug_data', 'test_scale_factor', 'base_size',
            'center', 'scale', 'flip_index'
        ])
]
data_root = '../animal_A/data'
dataset_type = 'animal_ADatasetBottomUp'
data = dict(
    workers_per_gpu=2,
    train_dataloader=dict(samples_per_gpu=16),
    val_dataloader=dict(samples_per_gpu=8),
    test_dataloader=dict(samples_per_gpu=8),
    train=dict(
        type='animal_ADatasetBottomUp',
        ann_file='../animal_A/data/train/animal_A_train.json',
        img_prefix='../animal_A/data/train/',
        data_cfg=dict(
            image_size=256,
            base_size=64,
            base_sigma=2,
            heatmap_size=[64, 64],
            num_output_channels=3,
            num_joints=3,
            dataset_channel=[[0, 1, 2]],
            inference_channel=[0, 1, 2],
            num_scales=2,
            scale_aware_sigma=False),
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(
                type='BottomUpRandomAffine',
                rot_factor=30,
                scale_factor=[0.75, 1.5],
                scale_type='short',
                trans_factor=40),
            dict(type='BottomUpRandomFlip', flip_prob=0.5),
            dict(type='ToTensor'),
            dict(
                type='NormalizeTensor',
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]),
            dict(type='BottomUpGenerateTarget', sigma=2, max_num_people=2),
            dict(
                type='Collect',
                keys=['img', 'joints', 'targets', 'masks'],
                meta_keys=[])
        ],
        dataset_info=dict(
            dataset_name='animal_A',
            paper_info=dict(
                author='',
                title='',
                container='',
                year='',
                homepage=''),
            keypoint_info=dict({
                0:
                dict(
                    name='head',
                    id=0,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                1:
                dict(
                    name='shoulder',
                    id=1,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                2:
                dict(
                    name='dock',
                    id=2,
                    color=[51, 153, 255],
                    type='upper',
                    swap='')
            }),
            skeleton_info=dict({
                0:
                dict(link=('head', 'shoulder'), id=0, color=[0, 255, 0]),
                1:
                dict(link=('shoulder', 'dock'), id=1, color=[0, 255, 0]),
                2:
                dict(link=('dock', 'head'), id=2, color=[0, 255, 0])
            }),
            joint_weights=[1.0, 1.0, 1.0],
            sigmas=[0.025, 0.025, 0.025])),
    val=dict(
        type='animal_ADatasetBottomUp',
        ann_file='../animal_A/data/test/animal_A_test.json',
        img_prefix='../animal_A/data/test/',
        data_cfg=dict(
            image_size=256,
            base_size=64,
            base_sigma=2,
            heatmap_size=[64, 64],
            num_output_channels=3,
            num_joints=3,
            dataset_channel=[[0, 1, 2]],
            inference_channel=[0, 1, 2],
            num_scales=2,
            scale_aware_sigma=False),
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
            dict(
                type='BottomUpResizeAlign',
                transforms=[
                    dict(type='ToTensor'),
                    dict(
                        type='NormalizeTensor',
                        mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
                ]),
            dict(
                type='Collect',
                keys=['img'],
                meta_keys=[
                    'image_file', 'aug_data', 'test_scale_factor', 'base_size',
                    'center', 'scale', 'flip_index'
                ])
        ],
        dataset_info=dict(
            dataset_name='animal_A',
            paper_info=dict(
                author='',
                title='',
                container='',
                year='',
                homepage=''),
            keypoint_info=dict({
                0:
                dict(
                    name='head',
                    id=0,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                1:
                dict(
                    name='shoulder',
                    id=1,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                2:
                dict(
                    name='dock',
                    id=2,
                    color=[51, 153, 255],
                    type='upper',
                    swap='')
            }),
            skeleton_info=dict({
                0:
                dict(link=('head', 'shoulder'), id=0, color=[0, 255, 0]),
                1:
                dict(link=('shoulder', 'dock'), id=1, color=[0, 255, 0]),
                2:
                dict(link=('dock', 'head'), id=2, color=[0, 255, 0])
            }),
            joint_weights=[1.0, 1.0, 1.0],
            sigmas=[0.025, 0.025, 0.025])),
    test=dict(
        type='animal_ADatasetBottomUp',
        ann_file='../animal_A/data/test/animal_A_test.json',
        img_prefix='../animal_A/data/test/',
        data_cfg=dict(
            image_size=256,
            base_size=64,
            base_sigma=2,
            heatmap_size=[64, 64],
            num_output_channels=3,
            num_joints=3,
            dataset_channel=[[0, 1, 2]],
            inference_channel=[0, 1, 2],
            num_scales=2,
            scale_aware_sigma=False),
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='BottomUpGetImgSize', test_scale_factor=[1]),
            dict(
                type='BottomUpResizeAlign',
                transforms=[
                    dict(type='ToTensor'),
                    dict(
                        type='NormalizeTensor',
                        mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
                ]),
            dict(
                type='Collect',
                keys=['img'],
                meta_keys=[
                    'image_file', 'aug_data', 'test_scale_factor', 'base_size',
                    'center', 'scale', 'flip_index'
                ])
        ],
        dataset_info=dict(
            dataset_name='animal_A',
            paper_info=dict(
                author='',
                title='',
                container='',
                year='',
                homepage=''),
            keypoint_info=dict({
                0:
                dict(
                    name='head',
                    id=0,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                1:
                dict(
                    name='shoulder',
                    id=1,
                    color=[51, 153, 255],
                    type='upper',
                    swap=''),
                2:
                dict(
                    name='dock',
                    id=2,
                    color=[51, 153, 255],
                    type='upper',
                    swap='')
            }),
            skeleton_info=dict({
                0:
                dict(link=('head', 'shoulder'), id=0, color=[0, 255, 0]),
                1:
                dict(link=('shoulder', 'dock'), id=1, color=[0, 255, 0]),
                2:
                dict(link=('dock', 'head'), id=2, color=[0, 255, 0])
            }),
            joint_weights=[1.0, 1.0, 1.0],
            sigmas=[0.025, 0.025, 0.025])))
work_dir = './work_dirs/animal_hrnet_w32_bottomUp_model5'
gpu_ids = range(0, 1)

load checkpoint from http path: https://download.openmmlab.com/mmpose/pretrain_models/hrnet_w32-36af842e.pth
2021-12-03 04:33:11,167 - mmpose - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: head.0.0.0.conv1.weight, head.0.0.0.bn1.weight, head.0.0.0.bn1.bias, head.0.0.0.bn1.running_mean, head.0.0.0.bn1.running_var, head.0.0.0.bn1.num_batches_tracked, head.0.0.0.conv2.weight, head.0.0.0.bn2.weight, head.0.0.0.bn2.bias, head.0.0.0.bn2.running_mean, head.0.0.0.bn2.running_var, head.0.0.0.bn2.num_batches_tracked, head.0.0.0.conv3.weight, head.0.0.0.bn3.weight, head.0.0.0.bn3.bias, head.0.0.0.bn3.running_mean, head.0.0.0.bn3.running_var, head.0.0.0.bn3.num_batches_tracked, head.0.0.0.downsample.0.weight, head.0.0.0.downsample.1.weight, head.0.0.0.downsample.1.bias, head.0.0.0.downsample.1.running_mean, head.0.0.0.downsample.1.running_var, head.0.0.0.downsample.1.num_batches_tracked, head.0.1.0.conv1.weight, head.0.1.0.bn1.weight, head.0.1.0.bn1.bias, head.0.1.0.bn1.running_mean, head.0.1.0.bn1.running_var, head.0.1.0.bn1.num_batches_tracked, head.0.1.0.conv2.weight, head.0.1.0.bn2.weight, head.0.1.0.bn2.bias, head.0.1.0.bn2.running_mean, head.0.1.0.bn2.running_var, head.0.1.0.bn2.num_batches_tracked, head.0.1.0.conv3.weight, head.0.1.0.bn3.weight, head.0.1.0.bn3.bias, head.0.1.0.bn3.running_mean, head.0.1.0.bn3.running_var, head.0.1.0.bn3.num_batches_tracked, head.0.1.0.downsample.0.weight, head.0.1.0.downsample.1.weight, head.0.1.0.downsample.1.bias, head.0.1.0.downsample.1.running_mean, head.0.1.0.downsample.1.running_var, head.0.1.0.downsample.1.num_batches_tracked, head.0.2.0.conv1.weight, head.0.2.0.bn1.weight, head.0.2.0.bn1.bias, head.0.2.0.bn1.running_mean, head.0.2.0.bn1.running_var, head.0.2.0.bn1.num_batches_tracked, head.0.2.0.conv2.weight, head.0.2.0.bn2.weight, head.0.2.0.bn2.bias, head.0.2.0.bn2.running_mean, head.0.2.0.bn2.running_var, head.0.2.0.bn2.num_batches_tracked, head.0.2.0.conv3.weight, head.0.2.0.bn3.weight, head.0.2.0.bn3.bias, head.0.2.0.bn3.running_mean, head.0.2.0.bn3.running_var, head.0.2.0.bn3.num_batches_tracked, head.0.2.0.downsample.0.weight, head.0.2.0.downsample.1.weight, head.0.2.0.downsample.1.bias, head.0.2.0.downsample.1.running_mean, head.0.2.0.downsample.1.running_var, head.0.2.0.downsample.1.num_batches_tracked, head.1.0.0.conv1.weight, head.1.0.0.bn1.weight, head.1.0.0.bn1.bias, head.1.0.0.bn1.running_mean, head.1.0.0.bn1.running_var, head.1.0.0.bn1.num_batches_tracked, head.1.0.0.conv2.weight, head.1.0.0.bn2.weight, head.1.0.0.bn2.bias, head.1.0.0.bn2.running_mean, head.1.0.0.bn2.running_var, head.1.0.0.bn2.num_batches_tracked, head.1.0.0.conv3.weight, head.1.0.0.bn3.weight, head.1.0.0.bn3.bias, head.1.0.0.bn3.running_mean, head.1.0.0.bn3.running_var, head.1.0.0.bn3.num_batches_tracked, head.1.0.0.downsample.0.weight, head.1.0.0.downsample.1.weight, head.1.0.0.downsample.1.bias, head.1.0.0.downsample.1.running_mean, head.1.0.0.downsample.1.running_var, head.1.0.0.downsample.1.num_batches_tracked, head.1.1.0.conv1.weight, head.1.1.0.bn1.weight, head.1.1.0.bn1.bias, head.1.1.0.bn1.running_mean, head.1.1.0.bn1.running_var, head.1.1.0.bn1.num_batches_tracked, head.1.1.0.conv2.weight, head.1.1.0.bn2.weight, head.1.1.0.bn2.bias, head.1.1.0.bn2.running_mean, head.1.1.0.bn2.running_var, head.1.1.0.bn2.num_batches_tracked, head.1.1.0.conv3.weight, head.1.1.0.bn3.weight, head.1.1.0.bn3.bias, head.1.1.0.bn3.running_mean, head.1.1.0.bn3.running_var, head.1.1.0.bn3.num_batches_tracked, head.1.1.0.downsample.0.weight, head.1.1.0.downsample.1.weight, head.1.1.0.downsample.1.bias, head.1.1.0.downsample.1.running_mean, head.1.1.0.downsample.1.running_var, head.1.1.0.downsample.1.num_batches_tracked, head.2.0.0.conv1.weight, head.2.0.0.bn1.weight, head.2.0.0.bn1.bias, head.2.0.0.bn1.running_mean, head.2.0.0.bn1.running_var, head.2.0.0.bn1.num_batches_tracked, head.2.0.0.conv2.weight, head.2.0.0.bn2.weight, head.2.0.0.bn2.bias, head.2.0.0.bn2.running_mean, head.2.0.0.bn2.running_var, head.2.0.0.bn2.num_batches_tracked, head.2.0.0.conv3.weight, head.2.0.0.bn3.weight, head.2.0.0.bn3.bias, head.2.0.0.bn3.running_mean, head.2.0.0.bn3.running_var, head.2.0.0.bn3.num_batches_tracked, head.2.0.0.downsample.0.weight, head.2.0.0.downsample.1.weight, head.2.0.0.downsample.1.bias, head.2.0.0.downsample.1.running_mean, head.2.0.0.downsample.1.running_var, head.2.0.0.downsample.1.num_batches_tracked, head.3.0.0.conv1.weight, head.3.0.0.bn1.weight, head.3.0.0.bn1.bias, head.3.0.0.bn1.running_mean, head.3.0.0.bn1.running_var, head.3.0.0.bn1.num_batches_tracked, head.3.0.0.conv2.weight, head.3.0.0.bn2.weight, head.3.0.0.bn2.bias, head.3.0.0.bn2.running_mean, head.3.0.0.bn2.running_var, head.3.0.0.bn2.num_batches_tracked, head.3.0.0.conv3.weight, head.3.0.0.bn3.weight, head.3.0.0.bn3.bias, head.3.0.0.bn3.running_mean, head.3.0.0.bn3.running_var, head.3.0.0.bn3.num_batches_tracked, head.3.0.0.downsample.0.weight, head.3.0.0.downsample.1.weight, head.3.0.0.downsample.1.bias, head.3.0.0.downsample.1.running_mean, head.3.0.0.downsample.1.running_var, head.3.0.0.downsample.1.num_batches_tracked, fc.weight, fc.bias, stage4.2.fuse_layers.1.0.0.0.weight, stage4.2.fuse_layers.1.0.0.1.weight, stage4.2.fuse_layers.1.0.0.1.bias, stage4.2.fuse_layers.1.0.0.1.running_mean, stage4.2.fuse_layers.1.0.0.1.running_var, stage4.2.fuse_layers.1.0.0.1.num_batches_tracked, stage4.2.fuse_layers.1.2.0.weight, stage4.2.fuse_layers.1.2.1.weight, stage4.2.fuse_layers.1.2.1.bias, stage4.2.fuse_layers.1.2.1.running_mean, stage4.2.fuse_layers.1.2.1.running_var, stage4.2.fuse_layers.1.2.1.num_batches_tracked, stage4.2.fuse_layers.1.3.0.weight, stage4.2.fuse_layers.1.3.1.weight, stage4.2.fuse_layers.1.3.1.bias, stage4.2.fuse_layers.1.3.1.running_mean, stage4.2.fuse_layers.1.3.1.running_var, stage4.2.fuse_layers.1.3.1.num_batches_tracked, stage4.2.fuse_layers.2.0.0.0.weight, stage4.2.fuse_layers.2.0.0.1.weight, stage4.2.fuse_layers.2.0.0.1.bias, stage4.2.fuse_layers.2.0.0.1.running_mean, stage4.2.fuse_layers.2.0.0.1.running_var, stage4.2.fuse_layers.2.0.0.1.num_batches_tracked, stage4.2.fuse_layers.2.0.1.0.weight, stage4.2.fuse_layers.2.0.1.1.weight, stage4.2.fuse_layers.2.0.1.1.bias, stage4.2.fuse_layers.2.0.1.1.running_mean, stage4.2.fuse_layers.2.0.1.1.running_var, stage4.2.fuse_layers.2.0.1.1.num_batches_tracked, stage4.2.fuse_layers.2.1.0.0.weight, stage4.2.fuse_layers.2.1.0.1.weight, stage4.2.fuse_layers.2.1.0.1.bias, stage4.2.fuse_layers.2.1.0.1.running_mean, stage4.2.fuse_layers.2.1.0.1.running_var, stage4.2.fuse_layers.2.1.0.1.num_batches_tracked, stage4.2.fuse_layers.2.3.0.weight, stage4.2.fuse_layers.2.3.1.weight, stage4.2.fuse_layers.2.3.1.bias, stage4.2.fuse_layers.2.3.1.running_mean, stage4.2.fuse_layers.2.3.1.running_var, stage4.2.fuse_layers.2.3.1.num_batches_tracked, stage4.2.fuse_layers.3.0.0.0.weight, stage4.2.fuse_layers.3.0.0.1.weight, stage4.2.fuse_layers.3.0.0.1.bias, stage4.2.fuse_layers.3.0.0.1.running_mean, stage4.2.fuse_layers.3.0.0.1.running_var, stage4.2.fuse_layers.3.0.0.1.num_batches_tracked, stage4.2.fuse_layers.3.0.1.0.weight, stage4.2.fuse_layers.3.0.1.1.weight, stage4.2.fuse_layers.3.0.1.1.bias, stage4.2.fuse_layers.3.0.1.1.running_mean, stage4.2.fuse_layers.3.0.1.1.running_var, stage4.2.fuse_layers.3.0.1.1.num_batches_tracked, stage4.2.fuse_layers.3.0.2.0.weight, stage4.2.fuse_layers.3.0.2.1.weight, stage4.2.fuse_layers.3.0.2.1.bias, stage4.2.fuse_layers.3.0.2.1.running_mean, stage4.2.fuse_layers.3.0.2.1.running_var, stage4.2.fuse_layers.3.0.2.1.num_batches_tracked, stage4.2.fuse_layers.3.1.0.0.weight, stage4.2.fuse_layers.3.1.0.1.weight, stage4.2.fuse_layers.3.1.0.1.bias, stage4.2.fuse_layers.3.1.0.1.running_mean, stage4.2.fuse_layers.3.1.0.1.running_var, stage4.2.fuse_layers.3.1.0.1.num_batches_tracked, stage4.2.fuse_layers.3.1.1.0.weight, stage4.2.fuse_layers.3.1.1.1.weight, stage4.2.fuse_layers.3.1.1.1.bias, stage4.2.fuse_layers.3.1.1.1.running_mean, stage4.2.fuse_layers.3.1.1.1.running_var, stage4.2.fuse_layers.3.1.1.1.num_batches_tracked, stage4.2.fuse_layers.3.2.0.0.weight, stage4.2.fuse_layers.3.2.0.1.weight, stage4.2.fuse_layers.3.2.0.1.bias, stage4.2.fuse_layers.3.2.0.1.running_mean, stage4.2.fuse_layers.3.2.0.1.running_var, stage4.2.fuse_layers.3.2.0.1.num_batches_tracked

loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
=> num_images: 478
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
=> num_images: 120
2021-12-03 04:33:15,505 - mmpose - INFO - load checkpoint from local path: ./work_dirs/animal_hrnet_w32_bottomUp_model5/epoch_100.pth
2021-12-03 04:33:21,023 - mmpose - INFO - resumed epoch 100, iter 3000
2021-12-03 04:33:21,029 - mmpose - INFO - Start running, host: <my host>, work_dir: <my work dir>
2021-12-03 04:33:21,030 - mmpose - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) CheckpointHook
(NORMAL      ) EvalHook
(VERY_LOW    ) TextLoggerHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) EvalHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
 --------------------
before_train_iter:
(VERY_HIGH   ) StepLrUpdaterHook
(NORMAL      ) EvalHook
(LOW         ) IterTimerHook
 --------------------
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL      ) CheckpointHook
(NORMAL      ) EvalHook
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
 --------------------
after_train_epoch:
(NORMAL      ) CheckpointHook
(NORMAL      ) EvalHook
(VERY_LOW    ) TextLoggerHook
 --------------------
before_val_epoch:
(LOW         ) IterTimerHook
(VERY_LOW    ) TextLoggerHook
 --------------------
before_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_iter:
(LOW         ) IterTimerHook
 --------------------
after_val_epoch:
(VERY_LOW    ) TextLoggerHook
 --------------------
after_run:
(VERY_LOW    ) TextLoggerHook
 --------------------
2021-12-03 04:33:21,030 - mmpose - INFO - workflow: [('train', 1)], max: 400 epochs
2021-12-03 04:33:21,031 - mmpose - INFO - Checkpoints will be saved to <my work dir> by HardDiskBackend.

Additionally, during the evaluation step, I receive an error:

[                                                  ] 0/120, elapsed: 0s, ETA:Traceback (most recent call last):
  File "./tools/train.py", line 182, in <module>
    main()
  File "./tools/train.py", line 178, in main
    meta=meta)
  File "mmpose/mmpose/apis/train.py", line 156, in train_model
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch
    self._do_evaluate(runner)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py", line 271, in _do_evaluate
    results = self.test_fn(runner.model, self.dataloader)
  File "mmpose/mmpose/apis/test.py", line 33, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 42, in forward
    return super().forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
    return old_func(*args, **kwargs)
  File "mmpose/mmpose/models/detectors/associative_embedding.py", line 133, in forward
    img, img_metas, return_heatmap=return_heatmap, **kwargs)
  File "mmpose/mmpose/models/detectors/associative_embedding.py", line 216, in forward_test
    assert img.size(0) == 1
AssertionError

Both of these training issues do not appear when training a top down model.

I have tried some of the suggestions found in issue #347 to improve my bottom up model, but still no luck. Do you have any suggestions about what I should try next?

Thank you!

jin-s13 commented 2 years ago

About the first question: the dataset is too small, I think it is not sufficient to train a good bottom-up model. As it always requires more data to achieve good performance.

I am not very clear with this "However, there are many frames where there are more than 3 keypoints being detected and drawn, sometimes up to a total of 6. The additional keypoints are usually adjacent to the existing keypoints." I am sorry but I can't imagine it in my mind. I think it will be helpful, if you can post some example images here.

About the assertionError. This is because bottom-up models do not support batchsize>1 for inference. You may simply set the batchsize=1 in the config during inference.

jin-s13 commented 2 years ago

BTW, I will recommend training a top-down model along with a detection model in this case.

Chttan commented 2 years ago

Hello,

Thank you for the timely response!

About the first question: the dataset is too small, I think it is not sufficient to train a good bottom-up model. As it always requires more data to achieve good performance.

What size of a dataset would you suggest to achieve good results for a bottom up model?

I am not very clear with this "However, there are many frames where there are more than 3 keypoints being detected and drawn, sometimes up to a total of 6. The additional keypoints are usually adjacent to the existing keypoints." I am sorry but I can't imagine it in my mind. I think it will be helpful, if you can post some example images here.

Please find a representative image illustrating the problem I am facing: horse_extra_kpt

My skeleton is defined as only 3 keypoints, however, more are being detected in some frames. To me it seems likely that 2 skeletons are being drawn. The additional keypoints/skeleton appears in about 1/3 of the inferred frames and I am not sure what might cause this.

About the assertionError. This is because bottom-up models do not support batchsize>1 for inference. You may simply set the batchsize=1 in the config during inference.

I will give this a try, thank you!

BTW, I will recommend training a top-down model along with a detection model in this case.

I have given this a try, but am not having luck with the detection model. I will take another shot at it.

Thank you again for your help!

jin-s13 commented 2 years ago

Sorry for the late reply. Could you please print the predicted poses? I assume that the model detects two sets of poses. And you can easily keep one and discard the other. Have you tried lower nms threshold?
https://github.com/open-mmlab/mmpose/blob/4297d1e932b2b346af71de9f98e3774ae0b66aec/demo/bottom_up_img_demo.py#L41

Chttan commented 2 years ago

Hello, my apologies for the delay in replying.

Could you please print the predicted poses? I assume that the model detects two sets of poses. And you can easily keep one and discard the other.

Yes, this is what it seems to have been doing. For one way to resolve my issue, I did try filtering out one of the detected poses.

Have you tried lower nms threshold?

Setting nms threshold to 0.5 also filtered out the extra skeleton.

The third method I used was to set max_num_people to 1, but this was less than ideal as it removes the possibility to track more than one animal.

Thank you for all of your help and suggestions!

open-mmlab / mmpose

Bottom Up inference detecting extra keypoints #1054