[Bug] 修改coco数据集关键点个数后报错: stack expects each tensor to be equal size

39-2 commented 9 months ago

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmpose).

Environment

python -c "from mmpose.utils import collect_env; print(collect_env())" OrderedDict([('sys.platform', 'win32'), ('Python', '3.8.18 | packaged by conda-forge | (default, Dec 23 2023, 17:17:17) [MSC v.1929 64 bit (AMD64)]'),

('CUDA available', True), ('numpy_random_seed', 2147483648),

('GPU 0', 'NVIDIA GeForce RTX 2070 with Max-Q Design'), ('CUDA_HOME', 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6'), ('NVCC', 'Cuda compilation tools, release 11.6, V11.6.55'), ('MSVC', '用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.29.30146 版'), ('GCC', 'n/a'),

('PyTorch', '1.12.1+cu113'),

('PyTorch compiling details', 'PyTorch built with:\n - C++ Version: 199711\n - MSVC 192829337\n - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n - OpenMP 2019\n - LAPACK is enabled (usually provided by MKL)\n - CPU capability usage: AVX2\n - CUDA Runtime 11.3\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.3.2 (built against CUDA 11.5)\n - Magma 2.5.4\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, \n'),

('TorchVision', '0.13.1+cu113'), ('OpenCV', '4.9.0'), ('MMEngine', '0.10.2'), ('MMPose', '1.3.1+16682fc')])

pip list | grep mm mmcv 2.1.0 mmdet 3.3.0 mmengine 0.10.2 mmpose 1.3.1 d:\anaconda\envs\mmpose\lib\site-packages\mmpose-1.3.1-py3.8.egg

Reproduces the problem - code sample

config file 由td-hm_mobilenetv2_8xb32-60e_coco-wholebody-face-256x256.py修改

...
    head=dict(
        type='HeatmapHead',
        in_channels=1280,
        out_channels=17,
        loss=dict(type='KeypointMSELoss', use_target_weight=True),
        decoder=codec),
...
dataset_type = 'CocoWholeBodyFace17PointDataset'

CocoWholeBodyFace17PointDataset.py

    METAINFO: dict = dict(
        from_file='configs/_base_/datasets/coco_wholebody_face_17point.py')

coco_wholebody_face_17point.py

...
    keypoint_info={
        0:
        dict(name='face-30', id=30, color=[255, 0, 0], type='', swap=''),
        1: 
        dict(name='face-36', id=36, color=[255, 0, 0], type='', swap='face-45'),
        2: 
        dict(name='face-37', id=37, color=[255, 0, 0], type='', swap='face-44'),
        3: 
        dict(name='face-38', id=38, color=[255, 0, 0], type='', swap='face-43'),
        4: 
        dict(name='face-39', id=39, color=[255, 0, 0], type='', swap='face-42'),
        5: 
        dict(name='face-40', id=40, color=[255, 0, 0], type='', swap='face-47'),
        6: 
        dict(name='face-41', id=41, color=[255, 0, 0], type='', swap='face-46'),
        7: 
        dict(name='face-42', id=42, color=[255, 0, 0], type='', swap='face-39'),
        8: 
        dict(name='face-43', id=43, color=[255, 0, 0], type='', swap='face-38'),
        9: 
        dict(name='face-44', id=44, color=[255, 0, 0], type='', swap='face-37'),
        10: 
        dict(name='face-45', id=45, color=[255, 0, 0], type='', swap='face-36'),
        11: 
        dict(name='face-46', id=46, color=[255, 0, 0], type='', swap='face-41'),
        12: 
        dict(name='face-47', id=47, color=[255, 0, 0], type='', swap='face-40'),
        13: 
        dict(name='face-48', id=48, color=[255, 0, 0], type='', swap='face-54'),
        14: 
        dict(name='face-51', id=52, color=[255, 0, 0], type='', swap=''),
        15: 
        dict(name='face-54', id=54, color=[255, 0, 0], type='', swap='face-48'),
        16: 
        dict(name='face-57', id=57, color=[255, 0, 0], type='', swap=''),
    },
    skeleton_info={},
    joint_weights=[1.] * 17,

    # 'https://github.com/jin-s13/COCO-WholeBody/blob/master/'
    # 'evaluation/myeval_wholebody.py#L177'
    sigmas=[
        0.007, 0.017, 0.011, 0.009, 0.011, 
        0.009, 0.007, 0.013, 0.008, 0.011, 
        0.012, 0.010, 0.034, 0.008, 0.008, 
        0.010, 0.009])

已将数据集加入注册表

Reproduces the problem - command or script

python .\tools\train.py --config .\configs\face_2d_keypoint\topdown_heatmap\coco_wholebody_face\td-hm_mobilenetv2_17keypoints.py

Reproduces the problem - error message

File "D:\Anaconda\envs\mmpose\lib\site-packages\mmpose-1.3.1-py3.8.egg\mmpose\models\heads\heatmap_heads\heatmap_head.py", line 295, in loss gt_heatmaps = torch.stack( RuntimeError: stack expects each tensor to be equal size, but got [17, 64, 64] at entry 0 and [68, 64, 64] at entry 1

Additional information

希望使用coco_wholebody_face上的部分关键点进行关键点检测
使用了coco_wholebody数据集, 未作修改
在对数据集进行修改之后, 在进行训练时报错, 显示部分heatmaps的维度为68, 部分为17, 并且出现的位置随机不可复现, 在将config文件中 train_dataloader的shuffle置为false后, gt_fields.heatmaps[:]中出现错误的位置仍然不可控且找不到规律.

您好问题叙述如上, 希望能得到您的尽快解答谢谢.

Ben-Louis commented 9 months ago

应该是 random flip 的时候序号搞错了。你的 metainfo 里面每个关键点的 id 和 key 没对应上

39-2 commented 9 months ago

您好, 我在听从您的建议把coco_wholebody_face_17point.py内的代码改成与id对应后

keypoint_info={
        30:
        dict(name='face-30', id=30, color=[255, 0, 0], type='', swap=''),
        36: 
        dict(name='face-36', id=36, color=[255, 0, 0], type='', swap='face-45'),
        37: 
        dict(name='face-37', id=37, color=[255, 0, 0], type='', swap='face-44'),
        38: 
        dict(name='face-38', id=38, color=[255, 0, 0], type='', swap='face-43'),
        39: 
        dict(name='face-39', id=39, color=[255, 0, 0], type='', swap='face-42'),
        40: 
        dict(name='face-40', id=40, color=[255, 0, 0], type='', swap='face-47'),
        41: 
        dict(name='face-41', id=41, color=[255, 0, 0], type='', swap='face-46'),
        42: 
        dict(name='face-42', id=42, color=[255, 0, 0], type='', swap='face-39'),
        43: 
        dict(name='face-43', id=43, color=[255, 0, 0], type='', swap='face-38'),
        44: 
        dict(name='face-44', id=44, color=[255, 0, 0], type='', swap='face-37'),
        45: 
        dict(name='face-45', id=45, color=[255, 0, 0], type='', swap='face-36'),
        46: 
        dict(name='face-46', id=46, color=[255, 0, 0], type='', swap='face-41'),
        47: 
        dict(name='face-47', id=47, color=[255, 0, 0], type='', swap='face-40'),
        48: 
        dict(name='face-48', id=48, color=[255, 0, 0], type='', swap='face-54'),
        51: 
        dict(name='face-51', id=51, color=[255, 0, 0], type='', swap=''),
        54: 
        dict(name='face-54', id=54, color=[255, 0, 0], type='', swap='face-48'),
        57: 
        dict(name='face-57', id=57, color=[255, 0, 0], type='', swap=''),
    },

错误依然存在. 并且17和68的分布依然随机另外,我在更改的过程中,发现在同一目录下的coco_wholebody_face.py中

        51: dict(name='face-51', id=52, color=[255, 0, 0], type='', swap=''),
        52: dict(
            name='face-52', id=52, color=[255, 0, 0], type='', swap='face-50'),

这两行的id都为52, 请问是有意为之还是一个错误呢?

39-2 commented 9 months ago

已解决需要修改CocoWholeBodyFace17PointDataset.py 使其提取出所需要的点数并修改evaluation过程中左外眼角以及右眼外眼角的索引以让程序计算其距离

open-mmlab / mmpose