open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.85k stars 1.25k forks source link

Two label for a single keypoints #2157

Open MotiBaadror opened 1 year ago

MotiBaadror commented 1 year ago

I have my custom dataset that is 2d grid kind of object, but in a single image part of the object is visible. I trained Hrmet_256x256, and I am getting two label for single keypoint, for example if there are 0-11 keypoints on top line then model giving is this way: (0,11), (1,10), (2,9)...(8,3),(9,2)(11,0). Input images are such a way that if we split skeleton image into two half ( left half and right half ) then left half is not the mirror image of right half. If I devide skeleton into 11 equal width part then it would look like F E D C B A B C D E F

In the dataset definition I am using upper type for two top lines and lower for bottom two lines total 4 horizontal grid line. I defined links correctly.

Tau-J commented 1 year ago

Sorry, I'm not sure whether I understand your descriptions precisely. What is your question?

MotiBaadror commented 1 year ago

See the below problem after training the model. Keypoints defination is in below reference image. I did not define any swap points because in the real field image left part and right part is not mirror image. It's more like ( EDCBABCDE ), reference image might not show this because I am unable to write number updown. reference field

I am getting multiple keypoints for a single position. I tried to get most probable label for a perticular point, but still not the correct results. This most probable label get some point from left side of the field and some from right side of the fields. Here left means left from center.

For example inference results show top left point as (3, 19 ) which sum to 22, that are total points present in top horizontal line indexing from 0-22.

The point just below to left 10 yeadline is getting label as (26,42) which sum to 68, where 68 is sum of the lower index in this line+ higher index in this line which is ( 23+45)

From the dataset I an see that there are almost no images where the complete field is visible in a single image. Images are part of the field. But we know for the person body keypoints task the whole body is visible in the training image for so many times, that help model to learn the respective positions of keypoints in a single example more easily. What are your thoughts on this?

train pipeline i am using from hrnet_w32_animalpose_256x256.py
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownGetBboxCenterScale', padding=1.25), dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=2), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]),

field_with_points

2-6532-680-10626 57-0086

MotiBaadror commented 1 year ago

I think its flip augmentation that were causing problem. Now I am getting good accuracy 0.96 on training set, but if I infer on training set. The results does not match with this accuracy.