open-mmlab / mmpose

OpenMMLab Pose Estimation Toolbox and Benchmark.
https://mmpose.readthedocs.io/en/latest/
Apache License 2.0
5.59k stars 1.21k forks source link

Can noise be added to Dataset. #1357

Closed YuktiADY closed 2 years ago

YuktiADY commented 2 years ago

Hallo Team,

I was training the hrnet model and trying to improve the accuracy of model since trained the model too many times and it may lead to overfitting.

I would like to know if there is a possibility to augment the data with random noise in MMPOSE?

Where to look into the code of mmpose and how we can do this ?

Please suggest !

liqikai9 commented 2 years ago

Hi, you can add your custom data pipeline which can handle data preprocessing. For more detail, please refer to this tutorial: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/3_data_pipeline.md#extend-and-use-custom-pipelines

YuktiADY commented 2 years ago

I mean where we can look into the code of mmpose if there is possibility to add noise to dataset ?

YuktiADY commented 2 years ago

Hi, you can add your custom data pipeline which can handle data preprocessing. For more detail, please refer to this tutorial: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/3_data_pipeline.md#extend-and-use-custom-pipelinesI

In this link this is the noise added ? dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

Sorry to ask such questions because I am new to this topic so need help .

YuktiADY commented 2 years ago

In the config which I am training this snippet is already added .

dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

liqikai9 commented 2 years ago

if there is possibility to add noise to dataset ?

What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation can perform different data augmentation with custom probability randomly.

liqikai9 commented 2 years ago

In the config which I am training this snippet is already added .

dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

These three pipelines: TopDownAffine, ToTensor, NormalizeTensor will not have any randomness (or the noise you meant) while preparing data.

YuktiADY commented 2 years ago

if there is possibility to add noise to dataset ?

What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation can perform different data augmentation with custom probability randomly.

According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model .

So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?

YuktiADY commented 2 years ago

if there is possibility to add noise to dataset ?

What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation can perform different data augmentation with custom probability randomly.

According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model .

So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?

Also by looking into code of MMPOSE can we augment data with noise ?

liqikai9 commented 2 years ago

That depends on your need. I think you can try these pipelines. BTW, which dataset are you using?

TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation can perform different data augmentation with custom probability randomly.

YuktiADY commented 2 years ago

I have concatenated COCO and THEODORE dataset

YuktiADY commented 2 years ago

if there is possibility to add noise to dataset ?

What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation can perform different data augmentation with custom probability randomly.

According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model . So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?

Also by looking into code of MMPOSE can we augment data with noise ?

So, is it possible to augment data with random noise ??

liqikai9 commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

jin-s13 commented 2 years ago

@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020

jin-s13 commented 2 years ago

For more information about albumentations, please check https://albumentations.ai/

YuktiADY commented 2 years ago

@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020

I will check.

YuktiADY commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

YuktiADY commented 2 years ago

@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020

The above approach of adding above pipelines will also work right ?

jin-s13 commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.

YuktiADY commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.

I just want to first check based on the code in MMPOSE if it is possible to add noise . If yes then is there is possibility to augment data with random noise and how we can do that.

YuktiADY commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.

I just want to first check based on the code in MMPOSE if it is possible to add noise . If yes then is there is possibility to augment data with random noise and how we can do that.

YuktiADY commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.

These are different approaches for augmentation like center, flipping,crop the box . If i want to augment the data with random noise. how can i do that ? Will those above pipelines work ,i mean those pipelines contain methods like flipping, etc

liqikai9 commented 2 years ago

For flipping, you may try TopDownRandomFlip: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93

YuktiADY commented 2 years ago

For flipping, you may try TopDownRandomFlip: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93

Okay, I mean this is for flipping . What about adding noise , which one is for augmenting data with random noise ? Because random noise is also other augmentation method ,if i am wrong ?

liqikai9 commented 2 years ago

Flipping can be viewed as a method of augmenting data with random noise as it can randomly flip the image.

If you want to add pixel-level noise to the data, you can use albumentations in MMPose. An example to use it can be found here: https://github.com/open-mmlab/mmpose/blob/master/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192_coarsedropout.py#L108

YuktiADY commented 2 years ago

Flipping can be viewed as a method of augmenting data with random noise as it can randomly flip the image.

If you want to add pixel-level noise to the data, you can use albumentations in MMPose. An example to use it can be found here: https://github.com/open-mmlab/mmpose/blob/master/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192_coarsedropout.py#L108

okay Thank you. Understood.

YuktiADY commented 2 years ago

For flipping, you may try TopDownRandomFlip: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93

So just adding this class in config , are any other changes also required ?

YuktiADY commented 2 years ago

I just saw in the config dict(type='TopDownRandomFlip', flip_prob=0.5), is alreadythere .

liqikai9 commented 2 years ago

You can try to use this in your config and see if it has better results.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownGetBboxCenterScale', padding=1.25),
    dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3),
    dict(type='TopDownRandomFlip', flip_prob=0.5),
    dict(
        type='TopDownHalfBodyTransform',
        num_joints_half_body=8,
        prob_half_body=0.3),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
    dict(type='TopDownAffine'),
###########################
# add the Albumentation here
    dict(
        type='Albumentation',
        transforms=[
            dict(
                type='CoarseDropout',
                max_holes=8,
                max_height=40,
                max_width=40,
                min_holes=1,
                min_height=10,
                min_width=10,
                p=0.5),
        ]),
###########################
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTarget', sigma=2),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs'
        ]),
]
liqikai9 commented 2 years ago

I suggest you read more detail about the implementation of Albumentation in MMPose: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/shared_transform.py#L190 And then change the parameters according to your need. Hope this help!

YuktiADY commented 2 years ago

You can try to use this in your config and see if it has better results.

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='TopDownGetBboxCenterScale', padding=1.25),
    dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3),
    dict(type='TopDownRandomFlip', flip_prob=0.5),
    dict(
        type='TopDownHalfBodyTransform',
        num_joints_half_body=8,
        prob_half_body=0.3),
    dict(
        type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
    dict(type='TopDownAffine'),
###########################
# add the Albumentation here
    dict(
        type='Albumentation',
        transforms=[
            dict(
                type='CoarseDropout',
                max_holes=8,
                max_height=40,
                max_width=40,
                min_holes=1,
                min_height=10,
                min_width=10,
                p=0.5),
        ]),
###########################
    dict(type='ToTensor'),
    dict(
        type='NormalizeTensor',
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]),
    dict(type='TopDownGenerateTarget', sigma=2),
    dict(
        type='Collect',
        keys=['img', 'target', 'target_weight'],
        meta_keys=[
            'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
            'rotation', 'bbox_score', 'flip_pairs'
        ]),
]

Yes I understood this .

liqikai9 commented 2 years ago

So just adding this class in config , are any other changes also required ?

No other change is required.

liqikai9 commented 2 years ago

No, this will not work. To use this class in the config, this line is already sufficient:

dict(type='TopDownRandomFlip', flip_prob=0.5)

We won't add the definition of a class in the config file. The config only points out which class you use. Please refer to our tutorial to learn more about the config: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/0_config.md.

YuktiADY commented 2 years ago

I have applied Albumentations still my training results are not better.I am not understanding what is the reason.

The training results are not improving much .After training the results are AP = 0.7746 , before training AP = 0.790

dict( type='Albumentation', transforms=[ dict( type='GaussNoise', var_limit=(10.0, 50.0)), ]),

YuktiADY commented 2 years ago

Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter, TopDownRandomFlip, TopDownHalfBodyTransform, TopDownGetRandomScaleRotation. They can perform data augmentation with random noise.

Simply just have to add these class to config , no other changes will be required ?

These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.

I have applied Albumentations still my training results are not better.I am not understanding what is the reason.

The training results are not improving much .After training the hrnet_w32_256x192 results are AP = 0.7746 , before training AP = 0.790. Its not getting than 0.790. I am not able to find the definite reason.( that means based on results the training is not only useful b/c before training on test dataset its AP = 0.790 and after training AP = 0.7746)

dict( type='Albumentation', transforms=[ dict( type='GaussNoise', var_limit=(10.0, 50.0)), ]),

  1. ALso is it possible that training results are not good when we used pretrained model but accuracy is better w/o pretrained model because in the above case i have trained w/o pretrained model ?

Please provide your valuable comments how can i increase accuracy ??

liqikai9 commented 2 years ago
  1. The reason why adding Albumentations did not improve the results: Generally, data preprocessing should have no essential impact on training accuracy. So maybe you can try to
  1. Did you mean whether it will get better results if we train the model with pretrained model? As you are using HRNet, there may be two pretrained models.
Arch Input Size AP AP50 AP75 AR AR50 ckpt log
pose_hrnet_w32 256x192 0.746 0.904 0.819 0.799 0.942 ckpt log

Since you are training your own dataset, if the datasets are alike, you may load the weights of the above pretrained model b and fine tune on your own dataset. If you are already doing this, please neglect the second comment~

YuktiADY commented 2 years ago
  1. The reason why adding Albumentations did not improve the results: Generally, data preprocessing should have no essential impact on training accuracy. So maybe you can try to
  • change the model, e.g, change a larger model like hrnet_w48
  • increase the input image size to 384x288
  • tune the learning strategy, for example, use a smaller learning rate, and tune the batch size. Just saying~
  1. Did you mean whether it will get better results if we train the model with pretrained model? As you are using HRNet, there may be two pretrained models.
  • The pretrained model a: the weights of the backbone HRNet are pretrained on a large-scale dataset, e.g., ImageNet, with a classification task, so that it can extract more general features.
  • The pretrained model b: using the pretrained model a, it further trains on a large-scale dataset like COCO, with pose estimation task. You can download such pretrained models by clicking the ckpt button like the below table.

Arch Input Size AP AP50 AP75 AR AR50 ckpt log pose_hrnet_w32 256x192 0.746 0.904 0.819 0.799 0.942 ckpt log Since you are training your own dataset, if the datasets are alike, you may load the weights of the above pretrained model b and fine tune on your own dataset. If you are already doing this, please neglect the second comment~

  1. Yes, I have training a higher resolution model hrnet_w48_384 X 288 (without using pretrained model) with these specifications below. So adding albumentations doesnot have any impact on improving accuracy ?

optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict( policy='poly', power=0.9, min_lr=5e-5, by_epoch=True) total_epochs = 30

  1. I meant to refering to 2(b) using the

    load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'

What you mentioned in point 2(b) already been doing and below question is for that only . When i trained the hrnet model using pretrained model and evaluated on test dataset the AP = 0.7162 whereas when trained without using pretrained model AP was better = 0.7241. So i decided to train without using pretrained model but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?

FYR - I have concatenated the COCO and my dataset(Theodore+) and training on both dataset.(because when i trained the model on my dataset the AP was decreasing rather than increasing , do you have nay idea why it happens )

Also i would like to ask that this hrnet model we are using what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere .

Can we while training vaildate our model directly on test dataset(because i am doing that ) ? Please suggest. Awaiting on your response.

YuktiADY commented 2 years ago

Awaiting for you response. Please suggest !

liqikai9 commented 2 years ago
  1. So adding albumentations doesnot have any impact on improving accuracy ?

I am not saying that adding albumentations will have no impact. Actually, I am not experienced with albumentations as well. @jin-s13 Could you help with this issue?

  1. but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?

This may be possible. I think that may depend on the similarities between your own dataset and the the COCO dataset.

  1. what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere

For the detail of the HRNet architecture, you can refer to this for the related information about the paper or refer to the official HRNet paper.

  1. Can we while training vaildate our model directly on test dataset(because i am doing that ) ?

Yes, this is available. You can set in the config file like this. And the example usage is as follows.

evaluation = dict(  # Config of evaluation during training
    interval=10,  # Interval to perform evaluation, e.g., epochs
    metric='mAP',  # Metrics to be performed
    save_best='AP')  # set `AP` as key indicator to save best checkpoint
YuktiADY commented 2 years ago
  1. So adding albumentations doesnot have any impact on improving accuracy ?

I am not saying that adding albumentations will have no impact. Actually, I am not experienced with albumentations as well. @jin-s13 Could you help with this issue?

  1. but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?

This may be possible. I think that may depend on the similarities between your own dataset and the the COCO dataset.

  1. what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere

For the detail of the HRNet architecture, you can refer to this for the related information about the paper or refer to the official HRNet paper.

  1. Can we while training vaildate our model directly on test dataset(because i am doing that ) ?

Yes, this is available. You can set in the config file like this. And the example usage is as follows.

evaluation = dict(  # Config of evaluation during training
    interval=10,  # Interval to perform evaluation, e.g., epochs
    metric='mAP',  # Metrics to be performed
    save_best='AP')  # set `AP` as key indicator to save best checkpoint

Thank you for providing the research paper for HRNET. Could also provide the link for paper for ResNet and its architecture ?? HRNET and Resnet are the two different backbone networks ?? Can we say them model or algorithm used for training ?

liqikai9 commented 2 years ago

For ResNet, you can refer to this or SimpleBaseline2d that use ResNet as the backbone and add deconvolutional layers to increase the feature resolution.

HRNET and Resnet are the two different backbone networks ??

Yes. HRNet and ResNet are both backbones that are used to extract the image feature.

Can we say them model or algorithm used for training ?

Usually, the model in MMPose contains a backbone and a head on top of the backbone to serve different purposes.

Alternatively, algorithm is a general concept that refer to the backbone or the whole model, usually depending on the novelty.

YuktiADY commented 2 years ago

Simple Baseline 2D is an algorithm ? Because if i say i am using simple baseline 2D algorithm i am indirectly saying that I am saying that I am using Resnet. The only difference between Resnet and HRnet is that Hrnet uses different feature extractor. Is that main difference ?

liqikai9 commented 2 years ago

Simple Baseline 2D is an algorithm ?

Yes, you can say like that.

Is that main difference ?

Yes, HRNet and ResNet are two different feature extractors.

YuktiADY commented 2 years ago

How to test the pre trained models that is trained on coco and evaluate on my test dataset ??

YuktiADY commented 2 years ago

I did these changes in config : base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/coco_wholebody.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'

test_pipeline = val_pipeline

data_root = 'data/coco' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TopDownCocoWholeBodyDataset',

ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',

    **ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',**
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=val_pipeline,
    dataset_info={{_base_.dataset_info}}),
test=dict(
    type='TopDownCocoWholeBodyDataset',
    #ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
    **ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',**
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=test_pipeline,
    dataset_info={{_base_.dataset_info}}),

The script i used for testing is this . ./mmpose/tools/dist_test.sh ./FES_Results_COCO/hrnet_w348_coco_wholebody_388x288.py "/home/yukti/Downloads/hrnet_w48_coco_384x288-314c8528_20200708.pth" 1 --eval mAP

But i am getting siye mismatch error,

l**oad checkpoint from local path: /home/yukti/Downloads/hrnet_w48_coco_384x288-314c8528_20200708.pth The model and loaded state dict do not match exactly

size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([17, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([133, 48, 1, 1]). size mismatch for keypoint_head.final_layer.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([133]).**

Is this above things I am doing correct ?

Awaiting for your response.

liqikai9 commented 2 years ago

load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'

This checkpoint seems like a model trained on COCO dataset, but not on COCO-Wholebody dataset. Please choose another appropriate checkpoint file. You can find one here.

YuktiADY commented 2 years ago

Please find my changes in config.

I gave coco whole body only but i get this

base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None

data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',

ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',

    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    #ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=val_pipeline,
    dataset_info={{_base_.dataset_info}}),
test=dict(
    type='TheodorePlusV2Dataset',
    #ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=test_pipeline,
    dataset_info={{_base_.dataset_info}}),

I get size mismatch error and AP = 0.0 The model and loaded state dict do not match exactly

size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([133, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([17, 48, 1, 1]). size mismatch for keypoint_head.final_layer.bias: copying a param with shape torch.Size([133]) from checkpoint, the shape in current model is torch.Size([17]). [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 736/736, 19.5 task/s, elapsed: 38s, ETA: 0sLoading and preparing results... DONE (t=0.03s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=0.12s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.001 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.025 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000 AP: 2.3954008304056212e-05 AP (L): 0.0 AP (M): 0.0002155860747365059 AP .5: 7.984669434685404e-05 AP .75: 0.0 AR: 0.0004081632653061224 AR (L): 0.0 AR (M): 0.025 AR .5: 0.0013605442176870747 AR .75: 0.0

when i gave checkpoint for coco.It gave results with AP

I even tried testing resnet50 model and gave checkpoint for coco dataset only , it ran and gave results but when i run on with coco dataset its gives size mismatch error. I even checked the number of keypoints ..

Is there is a problem if we give check point for coco but in config its type is topdowncoco whole body .?? will there difference in results in AP ?

liqikai9 commented 2 years ago

Could you please provide the config for your model? Seems like the model you are using will output 17 channels but the checkpoint you are using will output 133 channels.

size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([133, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([17, 48, 1, 1]).

Please use the model that matches your expected output. If you would like to test on a dataset like COCO-Wholebody dataset (which has 133 keypoints thus need to output 133 channels), please use checkpoint like this:

load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth'

If you would like to test on a dataset like COCO dataset (which has 17 keypoints thus need to output 17 channels), please use checkpoint like this:

load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'

YuktiADY commented 2 years ago

Please find the config below

base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None dist_params = dict(backend='nccl') workflow = [('train', 1)] checkpoint_config = dict(interval=10) evaluation = dict(interval=10, metric='mAP', save_best='AP')

optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict( policy='step', warmup=None,

warmup='linear',

# warmup_iters=500,
# warmup_ratio=0.001,
step=[170, 200])

total_epochs = 210 log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ list(range(17)), ], inference_channel=list(range(17)))

model settings

model = dict( type='TopDown', pretrained='https://download.openmmlab.com/mmpose/' 'pretrain_models/hrnet_w48-8ef0771d.pth', backbone=dict( type='HRNet', in_channels=3, extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(48, 96)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384))), ), keypoint_head=dict( type='TopdownHeatmapSimpleHead', in_channels=48, out_channels=channel_cfg['num_output_channels'], num_deconv_layers=0, extra=dict(final_conv_kernel=1, ), loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), train_cfg=dict(), test_cfg=dict( flip_test=True, post_process='default', shift_heatmap=True, modulate_kernel=11))

data_cfg = dict( image_size=[288, 384], heatmap_size=[72, 96], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=False, det_bbox_thr=0.0,

bbox_file='data/coco/person_detection_results/'

#'COCO_val2017_detections_AP_H_56_person.json',
bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_bboxes_scenario1.json',
#bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_bboxes_scenario2.json',

)

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=3), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

test_pipeline = val_pipeline

data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',

ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',

    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    #ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=val_pipeline,
    dataset_info={{_base_.dataset_info}}),
test=dict(
    type='TheodorePlusV2Dataset',
    #ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=test_pipeline,
    dataset_info={{_base_.dataset_info}}),

)

Also I would like to ask if lr = 5e-4 and i reduce it by a factor of 10 so lr = 5e-5 right ? which is best lr out of two ?

YuktiADY commented 2 years ago

base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None dist_params = dict(backend='nccl') workflow = [('train', 1)] checkpoint_config = dict(interval=10) evaluation = dict(interval=10, metric='mAP', save_best='AP')

optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict( policy='step', warmup=None,

warmup='linear',

# warmup_iters=500,
# warmup_ratio=0.001,
step=[170, 200])

total_epochs = 210 log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ list(range(17)), ], inference_channel=list(range(17)))

model settings

model = dict( type='TopDown', pretrained='https://download.openmmlab.com/mmpose/' 'pretrain_models/hrnet_w48-8ef0771d.pth', backbone=dict( type='HRNet', in_channels=3, extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(48, 96)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384))), ), keypoint_head=dict( type='TopdownHeatmapSimpleHead', in_channels=48, out_channels=channel_cfg['num_output_channels'], num_deconv_layers=0, extra=dict(final_conv_kernel=1, ), loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), train_cfg=dict(), test_cfg=dict( flip_test=True, post_process='default', shift_heatmap=True, modulate_kernel=11))

data_cfg = dict( image_size=[288, 384], heatmap_size=[72, 96], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=False, det_bbox_thr=0.0,

bbox_file='data/coco/person_detection_results/'

#'COCO_val2017_detections_AP_H_56_person.json',
bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_bboxes_scenario1.json',
#bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_bboxes_scenario2.json',

)

train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=3), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]

test_pipeline = val_pipeline

data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',

ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',

    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    #ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=val_pipeline,
    dataset_info={{_base_.dataset_info}}),
test=dict(
    type='TheodorePlusV2Dataset',
    #ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
    ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
    img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
    #img_prefix=f'{data_root}/val2017/',
    data_cfg=data_cfg,
    pipeline=test_pipeline,
    dataset_info={{_base_.dataset_info}}),

)

Also if lr = 5e-4 and i reduce by factor of 10 , the lr = 5e-05 ? which is best out of these 2 lr ?

liqikai9 commented 2 years ago

Please change this parameter according to your dataset.

channel_cfg = dict( num_output_channels=17,

Also if lr = 5e-4 and i reduce by factor of 10 , the lr = 5e-05 ? which is best out of these 2 lr ?

I am afraid that these hyper parameters may need to be tuned on your own dataset.