[Bug] STGCN training not going as expected

MABatin commented 1 year ago

Branch

0.x branch (0.x version, such as v0.24.1)

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version.

Environment

sys.platform: linux Python: 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce GTX 1080 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.12.1+cu113 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.3
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.2.1
- Built with CuDNN 8.3.2
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1+cu113 OpenCV: 4.5.4 MMEngine: 0.7.3 MMAction2: 1.0.0+

Describe the bug

When training STGCN model with a custom dataset with 3 classes, I see that the loss isn't going down at all. It's like the following: W B Chart 5_25_2023, 1 24 36 PM

W B Chart 5_25_2023, 1 24 21 PM

As can be seen, training loss is just oscillating and val/top1_accuracy remains just constant. This indicates the model isn't learning anything. Why is that?

Reproduces the problem - code sample

I am using the following config:

model = dict(
    type='SkeletonGCN',
    backbone=dict(
        type='STGCN',
        in_channels=3,
        edge_importance_weighting=True,
        graph_cfg=dict(layout='coco', strategy='spatial')),
    cls_head=dict(
        type='STGCNHead',
        num_classes=3,
        in_channels=256,
        loss_cls=dict(type='CrossEntropyLoss', class_weight=[0.632, 1.0, 2.496])),  # class_weight=[0.632, 1.0, 2.496] / ntu60-fall:[46.818, 1.0, 0.999]
    train_cfg=None,
    test_cfg=None)

dataset_type = 'PoseDataset'
ann_file_train = '/home/portia/portia-train/mmaction2/data/ntu-fall/ntu-fall_xsub_train.pkl'
ann_file_val = '/home/portia/portia-train/mmaction2/data/ntu-fall/ntu-fall_xsub_val.pkl'
train_pipeline = [
    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]
data = dict(
    videos_per_gpu=16,
    workers_per_gpu=2,
    test_dataloader=dict(videos_per_gpu=1),
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix='',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix='',
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix='',
        pipeline=test_pipeline))

# optimizer
optimizer = dict(
    type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001, nesterov=True)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(policy='step', step=[10, 50])
total_epochs = 80
checkpoint_config = dict(interval=5)
evaluation = dict(interval=5, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1,))
work_dir = './work_dirs/stgcn_80e_ntu60-fall_xsub_keypoint/'
log_config = dict(
    interval=1,
    hooks=[
        dict(type='TextLoggerHook', by_epoch=True),
        dict(
            type='WandbLoggerHook',
            by_epoch=True,
            init_kwargs={'entity': 'unholytsar',
                         'project': 'portialyze-carevision',
                         'name': 'stgcn_80e_ntu-fall_xsub_keypoint',
                         'dir': work_dir,
                         'resume': 'allow',
                         'id': '2wsewedceerrye'},
            interval=1)])

# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1), ('val', 1)]
gpu_ids = range(0, 1)

Reproduces the problem - command or script

No response

Reproduces the problem - error message

No response

Additional information

I expected training loss to go down and val accuracy to go up.
Instead loss is oscillating and val accuracy is just constant.

knifofia commented 1 year ago

Hi @MABatin, I faced the same issue as you did. For me, there were two issues:

a double normalization fixed by removing the one of the pipeline of the stgcn
a learning rate too high that I set to 0.001

Here is how I changed the pipeline of stgcn

train_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]
val_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100, num_clips=1, test_mode=True),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]
test_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100, num_clips=10, test_mode=True),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]

hope it helps

MABatin commented 1 year ago

Hi @MABatin, I faced the same issue as you did. For me, there were two issues:

* a double normalization fixed by removing the one of the pipeline of the stgcn

* a learning rate too high that I set to 0.001

Here is how I changed the pipeline of stgcn

train_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]
val_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100, num_clips=1, test_mode=True),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]
test_pipeline = [
    # dict(type="PreNormalize2D"),
    dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
    dict(type="UniformSampleFrames", clip_len=100, num_clips=10, test_mode=True),
    dict(type="PoseDecode"),
    dict(type="FormatGCNInput", num_person=2),
    dict(type="PackActionInputs"),
]

hope it helps

Thank you very much for the suggestion. I too saw an improvement in actual training after setting the learning rate lower. However, I did not make changes to the pipeline, so I don't know about that. I'm on 0.x version, so can you tell me where in the pipeline the double normalization issue might be happening?

    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
    dict(type='PaddingWithLoop', clip_len=6),
    dict(type='PoseDecode'),
    dict(type='FormatGCNInput', input_format='NCTVM'),
    dict(type='PoseNormalize'),
    dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['keypoint'])
]

knifofia commented 1 year ago

I choose to use mediapipe skeleton extractor to get the skeleton from my video dataset. Then, I convert the skeleton to the coco dataset format. I decide to use mediapipe because it is faster to extract skeleton and easy to implement.

There is already a normalization made by mediapipe on the skeleton. MMaction2 do another one. It seems it's an issue but I didn't dive deep into the code to find why

MABatin commented 1 year ago

I choose to use mediapipe skeleton extractor to get the skeleton from my video dataset. Then, I convert the skeleton to the coco dataset format. I decide to use mediapipe because it is faster to extract skeleton and easy to implement.

There is already a normalization made by mediapipe on the skeleton. MMaction2 do another one. It seems it's an issue but I didn't dive deep into the code to find why

I see. I am using YOLOv7 pose model to extract pose information which doesn't normalize the keypoints. So maybe double normalization isn't an issue in my case.

open-mmlab / mmaction2