Open MABatin opened 1 year ago
Hi @MABatin, I faced the same issue as you did. For me, there were two issues:
Here is how I changed the pipeline of stgcn
train_pipeline = [
# dict(type="PreNormalize2D"),
dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
dict(type="UniformSampleFrames", clip_len=100),
dict(type="PoseDecode"),
dict(type="FormatGCNInput", num_person=2),
dict(type="PackActionInputs"),
]
val_pipeline = [
# dict(type="PreNormalize2D"),
dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
dict(type="UniformSampleFrames", clip_len=100, num_clips=1, test_mode=True),
dict(type="PoseDecode"),
dict(type="FormatGCNInput", num_person=2),
dict(type="PackActionInputs"),
]
test_pipeline = [
# dict(type="PreNormalize2D"),
dict(type="GenSkeFeat", dataset="coco", feats=["j"]),
dict(type="UniformSampleFrames", clip_len=100, num_clips=10, test_mode=True),
dict(type="PoseDecode"),
dict(type="FormatGCNInput", num_person=2),
dict(type="PackActionInputs"),
]
hope it helps
Hi @MABatin, I faced the same issue as you did. For me, there were two issues:
* a double normalization fixed by removing the one of the pipeline of the stgcn * a learning rate too high that I set to 0.001
Here is how I changed the pipeline of stgcn
train_pipeline = [ # dict(type="PreNormalize2D"), dict(type="GenSkeFeat", dataset="coco", feats=["j"]), dict(type="UniformSampleFrames", clip_len=100), dict(type="PoseDecode"), dict(type="FormatGCNInput", num_person=2), dict(type="PackActionInputs"), ] val_pipeline = [ # dict(type="PreNormalize2D"), dict(type="GenSkeFeat", dataset="coco", feats=["j"]), dict(type="UniformSampleFrames", clip_len=100, num_clips=1, test_mode=True), dict(type="PoseDecode"), dict(type="FormatGCNInput", num_person=2), dict(type="PackActionInputs"), ] test_pipeline = [ # dict(type="PreNormalize2D"), dict(type="GenSkeFeat", dataset="coco", feats=["j"]), dict(type="UniformSampleFrames", clip_len=100, num_clips=10, test_mode=True), dict(type="PoseDecode"), dict(type="FormatGCNInput", num_person=2), dict(type="PackActionInputs"), ]
hope it helps
Thank you very much for the suggestion. I too saw an improvement in actual training after setting the learning rate lower. However, I did not make changes to the pipeline, so I don't know about that. I'm on 0.x version, so can you tell me where in the pipeline the double normalization issue might be happening?
dict(type='PaddingWithLoop', clip_len=6),
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM'),
dict(type='PoseNormalize'),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
val_pipeline = [
dict(type='PaddingWithLoop', clip_len=6),
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM'),
dict(type='PoseNormalize'),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
test_pipeline = [
dict(type='PaddingWithLoop', clip_len=6),
dict(type='PoseDecode'),
dict(type='FormatGCNInput', input_format='NCTVM'),
dict(type='PoseNormalize'),
dict(type='Collect', keys=['keypoint', 'label'], meta_keys=[]),
dict(type='ToTensor', keys=['keypoint'])
]
I choose to use mediapipe skeleton extractor to get the skeleton from my video dataset. Then, I convert the skeleton to the coco dataset format. I decide to use mediapipe because it is faster to extract skeleton and easy to implement.
There is already a normalization made by mediapipe on the skeleton. MMaction2 do another one. It seems it's an issue but I didn't dive deep into the code to find why
I choose to use mediapipe skeleton extractor to get the skeleton from my video dataset. Then, I convert the skeleton to the coco dataset format. I decide to use mediapipe because it is faster to extract skeleton and easy to implement.
There is already a normalization made by mediapipe on the skeleton. MMaction2 do another one. It seems it's an issue but I didn't dive deep into the code to find why
I see. I am using YOLOv7 pose model to extract pose information which doesn't normalize the keypoints. So maybe double normalization isn't an issue in my case.
Branch
0.x branch (0.x version, such as
v0.24.1
)Prerequisite
Environment
sys.platform: linux Python: 3.8.10 (default, Mar 13 2023, 10:26:41) [GCC 9.4.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce GTX 1080 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: x86_64-linux-gnu-gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.12.1+cu113 PyTorch compiling details: PyTorch built with:
TorchVision: 0.13.1+cu113 OpenCV: 4.5.4 MMEngine: 0.7.3 MMAction2: 1.0.0+
Describe the bug
When training STGCN model with a custom dataset with 3 classes, I see that the loss isn't going down at all. It's like the following:
As can be seen, training loss is just oscillating and val/top1_accuracy remains just constant. This indicates the model isn't learning anything. Why is that?
Reproduces the problem - code sample
I am using the following config:
Reproduces the problem - command or script
No response
Reproduces the problem - error message
No response
Additional information