open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.57k stars 1.21k forks source link

[Bug] 训练video_pose_lift相关的模型时,camera的参数不对,需要w和h #2976

Open 11610 opened 6 months ago

11610 commented 6 months ago

Prerequisite

Environment

`

OrderedDict([('sys.platform', 'win32'), ('Python', '3.9.18 (main, Sep 11 2023, 14:09:26) [MSC v.1916 64 bit (AMD64)]'), ('CUDA available', False), ('numpy_random_seed', 2147483648), ('GCC', 'n/a'), ('PyTorch', '2.1.2'), ('PyTorch compiling details', 'PyTorch built with:\n - C++ Version: 199711\n - MSVC 192930151\n - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n - OpenMP 2019\n - LAPACK is enabled (usually provided by MKL)\n - CPU capability usage: AVX2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /utf-8 /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.1.2, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.16.2'), ('OpenCV', '4.8.1'), ('MMEngine', '0.10.2'), ('MMPose', '1.3.1+5a3be94')])

`

Reproduces the problem - code sample

    # Normalize the 2D keypoint coordinate with image width and height
    _camera_param = deepcopy(camera_param)
    assert 'w' in _camera_param and 'h' in _camera_param, (
        'Camera parameter `w` and `h` should be provided.')

Reproduces the problem - command or script

PS F:\MyCode\BSVM\mmpose> & 'e:\Users\MSN\anaconda3\python.exe' 'c:\Users\MSN\.vscode\extensions\ms-python.debugpy-2024.2.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher' '62392' '--' 'F:\MyCode\BSVM\mmpose\tools\train.py' 'configs/body_3d_keypoint/video_pose_lift/h36m/video-pose-lift_tcn-27frm-supv_8xb128-160e_h36m.py' '--work-dir' 'train_result' '--resume' '--auto-scale-lr'

Reproduces the problem - error message

  File "d:\mycode\bsvm\mmpose\mmpose\codecs\video_pose_lifting.py", line 180, in encode
    assert 'w' in _camera_param and 'h' in _camera_param, (
AssertionError: Camera parameter `w` and `h` should be provided.

Additional information

按照我的理解video_pose_lift输入的应该是连续2D的人体骨骼数据,但我按照官方文档中进行训练,告知我camera参数不对,似乎输入的数据是图片数据,这一步应该是训练2D人体骨骼模型所需要的数据吧,而不是3D的,是代码目前不支持3D的训练还是我的配置出了问题呢?我在官方文档也找不到相关的介绍,请问可以帮下我吗?非常感谢!!

我想确认下MMPose是否支持训练video_pose_lift的模型,是否需要改动代码或者加额外的配置? 例如加上2D模型的配置和权重,由2D模型输出的2D人体骨骼再传入video_pose_lift模型中。我尝试了,可是不行,好像train.py支持一个配置文件作为参数。 于是我看了下代码,似乎目前还不太支持,是要自己改动代码?还是说我的理解除了错误?配置错了,我是新手,希望有大佬捞我一手!!

11610 commented 6 months ago

配置文件如下

_base_ = ['../../../_base_/default_runtime.py']

vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer')

# runtime
train_cfg = dict(max_epochs=160, val_interval=10)

# optimizer
optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3))

# learning policy
param_scheduler = [
    dict(type='ExponentialLR', gamma=0.975, end=80, by_epoch=True)
]

auto_scale_lr = dict(base_batch_size=1024)

# hooks
default_hooks = dict(
    checkpoint=dict(
        type='CheckpointHook',
        save_best='MPJPE',
        rule='less',
        max_keep_ckpts=1),
    logger=dict(type='LoggerHook', interval=20),
)

# codec settings
codec = dict(
    type='VideoPoseLifting',
    num_keypoints=133,
    zero_center=True,
    root_index=0,
    remove_root=False)

# model settings
model = dict(
    type='PoseLifter',
    backbone=dict(
        type='TCN',
        in_channels=2 * 133,
        stem_channels=1024,
        num_blocks=2,
        kernel_sizes=(3, 3, 3),
        dropout=0.25,
        use_stride_conv=True,
    ),
    head=dict(
        type='TemporalRegressionHead',
        in_channels=1024,
        num_joints=133,
        loss=dict(type='MPJPELoss'),
        decoder=codec,
    ))

# base dataset settings
dataset_type = 'H36MWholeBodyDataset'
data_root = 'Human36m'

# pipelines
train_pipeline = [
    dict(
        type='RandomFlipAroundRoot',
        keypoints_flip_cfg=dict(),
        target_flip_cfg=dict(),
    ),
    dict(type='GenerateTarget', encoder=codec),
    dict(
        type='PackPoseInputs',
        meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices',
                   'target_root'))
]
val_pipeline = [
    dict(type='GenerateTarget', encoder=codec),
    dict(
        type='PackPoseInputs',
        meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices',
                   'target_root'))
]

# data loaders
train_dataloader = dict(
    batch_size=128,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        ann_file='h3wb_train.npz',
        seq_len=27,
        causal=False,
        pad_video_seq=True,
        camera_param_file='cameras.pkl',
        data_root=data_root,
        data_prefix=dict(img='images/'),
        pipeline=train_pipeline,
    ),
)
val_dataloader = dict(
    batch_size=128,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
    dataset=dict(
        type=dataset_type,
        ann_file='h3wb_train.npz',
        seq_len=27,
        causal=False,
        pad_video_seq=True,
        camera_param_file='cameras.pkl',
        data_root=data_root,
        data_prefix=dict(img='images/'),
        pipeline=val_pipeline,
        test_mode=True,
    ))
test_dataloader = val_dataloader

# evaluators
val_evaluator = [
    dict(type='MPJPE', mode='mpjpe'),
    dict(type='MPJPE', mode='p-mpjpe')
]
test_evaluator = val_evaluator

是我的camera.pk文件不对吗?我是直接使用了MMPose里Human36提供的camera.pk文件(tests/data/h36m/cameras.pkl)