Closed YuktiADY closed 2 years ago
Hi, you can add your custom data pipeline which can handle data preprocessing. For more detail, please refer to this tutorial: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/3_data_pipeline.md#extend-and-use-custom-pipelines
I mean where we can look into the code of mmpose if there is possibility to add noise to dataset ?
Hi, you can add your custom data pipeline which can handle data preprocessing. For more detail, please refer to this tutorial: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/3_data_pipeline.md#extend-and-use-custom-pipelinesI
In this link this is the noise added ? dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
Sorry to ask such questions because I am new to this topic so need help .
In the config which I am training this snippet is already added .
dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
if there is possibility to add noise to dataset ?
What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which TopDownRandomShiftBboxCenter
, TopDownRandomFlip
, TopDownHalfBodyTransform
, TopDownGetRandomScaleRotation
can perform different data augmentation with custom probability randomly.
In the config which I am training this snippet is already added .
dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
These three pipelines: TopDownAffine
, ToTensor
, NormalizeTensor
will not have any randomness (or the noise you meant) while preparing data.
if there is possibility to add noise to dataset ?
What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
can perform different data augmentation with custom probability randomly.
According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model .
So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?
if there is possibility to add noise to dataset ?
What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
can perform different data augmentation with custom probability randomly.According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model .
So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?
Also by looking into code of MMPOSE can we augment data with noise ?
That depends on your need. I think you can try these pipelines. BTW, which dataset are you using?
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
can perform different data augmentation with custom probability randomly.
I have concatenated COCO and THEODORE dataset
if there is possibility to add noise to dataset ?
What does the noise mean here? If you mean the randomness in data preprocessing, you can find some pipelines here: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py, in which
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
can perform different data augmentation with custom probability randomly.According to me augmentating data with noise means we want to avoid over fitting and improve the performance of our model . So according to you what does augmenting data with noise means ? Do I need add these 4 classes ?
Also by looking into code of MMPOSE can we augment data with noise ?
So, is it possible to augment data with random noise ??
Yes, you can try to add one or several of these pipelines in your config: TopDownRandomShiftBboxCenter
, TopDownRandomFlip
, TopDownHalfBodyTransform
, TopDownGetRandomScaleRotation
.
They can perform data augmentation with random noise.
@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020
For more information about albumentations, please check https://albumentations.ai/
@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020
I will check.
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.
Simply just have to add these class to config , no other changes will be required ?
@YuktiADY You can use albumentations in mmpose, it supports various kinds of augmentation approaches. https://mmpose.readthedocs.io/en/latest/papers/techniques.html#albumentations-information-2020
The above approach of adding above pipelines will also work right ?
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.Simply just have to add these class to config , no other changes will be required ?
These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.Simply just have to add these class to config , no other changes will be required ?
These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.
I just want to first check based on the code in MMPOSE if it is possible to add noise . If yes then is there is possibility to augment data with random noise and how we can do that.
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.Simply just have to add these class to config , no other changes will be required ?
These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.
I just want to first check based on the code in MMPOSE if it is possible to add noise . If yes then is there is possibility to augment data with random noise and how we can do that.
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.Simply just have to add these class to config , no other changes will be required ?
These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.
These are different approaches for augmentation like center, flipping,crop the box . If i want to augment the data with random noise. how can i do that ? Will those above pipelines work ,i mean those pipelines contain methods like flipping, etc
For flipping, you may try TopDownRandomFlip
: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93
For flipping, you may try
TopDownRandomFlip
: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93
Okay, I mean this is for flipping . What about adding noise , which one is for augmenting data with random noise ? Because random noise is also other augmentation method ,if i am wrong ?
Flipping can be viewed as a method of augmenting data with random noise as it can randomly flip the image.
If you want to add pixel-level noise to the data, you can use albumentations
in MMPose.
An example to use it can be found here: https://github.com/open-mmlab/mmpose/blob/master/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192_coarsedropout.py#L108
Flipping can be viewed as a method of augmenting data with random noise as it can randomly flip the image.
If you want to add pixel-level noise to the data, you can use
albumentations
in MMPose. An example to use it can be found here: https://github.com/open-mmlab/mmpose/blob/master/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/hrnet_w32_coco_256x192_coarsedropout.py#L108
okay Thank you. Understood.
For flipping, you may try
TopDownRandomFlip
: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/top_down_transform.py#L93
So just adding this class in config , are any other changes also required ?
I just saw in the config dict(type='TopDownRandomFlip', flip_prob=0.5), is alreadythere .
You can try to use this in your config and see if it has better results.
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='TopDownGetBboxCenterScale', padding=1.25),
dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3),
dict(type='TopDownRandomFlip', flip_prob=0.5),
dict(
type='TopDownHalfBodyTransform',
num_joints_half_body=8,
prob_half_body=0.3),
dict(
type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5),
dict(type='TopDownAffine'),
###########################
# add the Albumentation here
dict(
type='Albumentation',
transforms=[
dict(
type='CoarseDropout',
max_holes=8,
max_height=40,
max_width=40,
min_holes=1,
min_height=10,
min_width=10,
p=0.5),
]),
###########################
dict(type='ToTensor'),
dict(
type='NormalizeTensor',
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]),
dict(type='TopDownGenerateTarget', sigma=2),
dict(
type='Collect',
keys=['img', 'target', 'target_weight'],
meta_keys=[
'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale',
'rotation', 'bbox_score', 'flip_pairs'
]),
]
I suggest you read more detail about the implementation of Albumentation
in MMPose: https://github.com/open-mmlab/mmpose/blob/master/mmpose/datasets/pipelines/shared_transform.py#L190
And then change the parameters according to your need. Hope this help!
You can try to use this in your config and see if it has better results.
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownGetBboxCenterScale', padding=1.25), dict(type='TopDownRandomShiftBboxCenter', shift_factor=0.16, prob=0.3), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), ########################### # add the Albumentation here dict( type='Albumentation', transforms=[ dict( type='CoarseDropout', max_holes=8, max_height=40, max_width=40, min_holes=1, min_height=10, min_width=10, p=0.5), ]), ########################### dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=2), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
Yes I understood this .
So just adding this class in config , are any other changes also required ?
No other change is required.
No, this will not work. To use this class in the config, this line is already sufficient:
dict(type='TopDownRandomFlip', flip_prob=0.5)
We won't add the definition of a class in the config file. The config only points out which class you use. Please refer to our tutorial to learn more about the config: https://github.com/open-mmlab/mmpose/blob/master/docs/en/tutorials/0_config.md.
I have applied Albumentations still my training results are not better.I am not understanding what is the reason.
The training results are not improving much .After training the results are AP = 0.7746 , before training AP = 0.790
dict( type='Albumentation', transforms=[ dict( type='GaussNoise', var_limit=(10.0, 50.0)), ]),
Yes, you can try to add one or several of these pipelines in your config:
TopDownRandomShiftBboxCenter
,TopDownRandomFlip
,TopDownHalfBodyTransform
,TopDownGetRandomScaleRotation
. They can perform data augmentation with random noise.Simply just have to add these class to config , no other changes will be required ?
These are also augmentation approaches, shifting the center, flipping, crop the box, scaling, rotation. But I think what you want is to add pixel-level noise or rgb jittering, right? If so, albumentations will meet your requirements.
I have applied Albumentations still my training results are not better.I am not understanding what is the reason.
The training results are not improving much .After training the hrnet_w32_256x192 results are AP = 0.7746 , before training AP = 0.790. Its not getting than 0.790. I am not able to find the definite reason.( that means based on results the training is not only useful b/c before training on test dataset its AP = 0.790 and after training AP = 0.7746)
dict( type='Albumentation', transforms=[ dict( type='GaussNoise', var_limit=(10.0, 50.0)), ]),
Please provide your valuable comments how can i increase accuracy ??
hrnet_w48
384x288
The pretrained model a: the weights of the backbone HRNet are pretrained on a large-scale dataset, e.g., ImageNet, with a classification task, so that it can extract more general features.
The pretrained model b: using the pretrained model a, it further trains on a large-scale dataset like COCO, with pose estimation task. You can download such pretrained models by clicking the ckpt
button like the below table.
Arch | Input Size | AP | AP50 | AP75 | AR | AR50 | ckpt | log |
---|---|---|---|---|---|---|---|---|
pose_hrnet_w32 | 256x192 | 0.746 | 0.904 | 0.819 | 0.799 | 0.942 | ckpt | log |
Since you are training your own dataset, if the datasets are alike, you may load the weights of the above pretrained model b and fine tune on your own dataset. If you are already doing this, please neglect the second comment~
- The reason why adding Albumentations did not improve the results: Generally, data preprocessing should have no essential impact on training accuracy. So maybe you can try to
- change the model, e.g, change a larger model like
hrnet_w48
- increase the input image size to
384x288
- tune the learning strategy, for example, use a smaller learning rate, and tune the batch size. Just saying~
- Did you mean whether it will get better results if we train the model with pretrained model? As you are using HRNet, there may be two pretrained models.
- The pretrained model a: the weights of the backbone HRNet are pretrained on a large-scale dataset, e.g., ImageNet, with a classification task, so that it can extract more general features.
- The pretrained model b: using the pretrained model a, it further trains on a large-scale dataset like COCO, with pose estimation task. You can download such pretrained models by clicking the
ckpt
button like the below table.Arch Input Size AP AP50 AP75 AR AR50 ckpt log pose_hrnet_w32 256x192 0.746 0.904 0.819 0.799 0.942 ckpt log Since you are training your own dataset, if the datasets are alike, you may load the weights of the above pretrained model b and fine tune on your own dataset. If you are already doing this, please neglect the second comment~
optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)
lr_config = dict( policy='poly', power=0.9, min_lr=5e-5, by_epoch=True) total_epochs = 30
What you mentioned in point 2(b) already been doing and below question is for that only . When i trained the hrnet model using pretrained model and evaluated on test dataset the AP = 0.7162 whereas when trained without using pretrained model AP was better = 0.7241. So i decided to train without using pretrained model but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?
FYR - I have concatenated the COCO and my dataset(Theodore+) and training on both dataset.(because when i trained the model on my dataset the AP was decreasing rather than increasing , do you have nay idea why it happens )
Also i would like to ask that this hrnet model we are using what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere .
Can we while training vaildate our model directly on test dataset(because i am doing that ) ? Please suggest. Awaiting on your response.
Awaiting for you response. Please suggest !
- So adding albumentations doesnot have any impact on improving accuracy ?
I am not saying that adding albumentations will have no impact. Actually, I am not experienced with albumentations as well. @jin-s13 Could you help with this issue?
- but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?
This may be possible. I think that may depend on the similarities between your own dataset and the the COCO dataset.
- what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere
For the detail of the HRNet architecture, you can refer to this for the related information about the paper or refer to the official HRNet paper.
- Can we while training vaildate our model directly on test dataset(because i am doing that ) ?
Yes, this is available. You can set in the config file like this. And the example usage is as follows.
evaluation = dict( # Config of evaluation during training
interval=10, # Interval to perform evaluation, e.g., epochs
metric='mAP', # Metrics to be performed
save_best='AP') # set `AP` as key indicator to save best checkpoint
- So adding albumentations doesnot have any impact on improving accuracy ?
I am not saying that adding albumentations will have no impact. Actually, I am not experienced with albumentations as well. @jin-s13 Could you help with this issue?
- but why model using pretrained model have less AP than w/o using pretrained model. Can this be possible ?
This may be possible. I think that may depend on the similarities between your own dataset and the the COCO dataset.
- what is the layer structure(how many layers) or architecture of the model. (input , output). I tried to find in repo but could not see anywhere
For the detail of the HRNet architecture, you can refer to this for the related information about the paper or refer to the official HRNet paper.
- Can we while training vaildate our model directly on test dataset(because i am doing that ) ?
Yes, this is available. You can set in the config file like this. And the example usage is as follows.
evaluation = dict( # Config of evaluation during training interval=10, # Interval to perform evaluation, e.g., epochs metric='mAP', # Metrics to be performed save_best='AP') # set `AP` as key indicator to save best checkpoint
Thank you for providing the research paper for HRNET. Could also provide the link for paper for ResNet and its architecture ?? HRNET and Resnet are the two different backbone networks ?? Can we say them model or algorithm used for training ?
For ResNet, you can refer to this or SimpleBaseline2d that use ResNet as the backbone and add deconvolutional layers to increase the feature resolution.
HRNET and Resnet are the two different backbone networks ??
Yes. HRNet and ResNet are both backbones that are used to extract the image feature.
Can we say them model or algorithm used for training ?
Usually, the model
in MMPose contains a backbone and a head on top of the backbone to serve different purposes.
Alternatively, algorithm
is a general concept that refer to the backbone or the whole model, usually depending on the novelty.
Simple Baseline 2D is an algorithm ? Because if i say i am using simple baseline 2D algorithm i am indirectly saying that I am saying that I am using Resnet. The only difference between Resnet and HRnet is that Hrnet uses different feature extractor. Is that main difference ?
Simple Baseline 2D is an algorithm ?
Yes, you can say like that.
Is that main difference ?
Yes, HRNet and ResNet are two different feature extractors.
How to test the pre trained models that is trained on coco and evaluate on my test dataset ??
I did these changes in config : base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/coco_wholebody.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'
test_pipeline = val_pipeline
data_root = 'data/coco' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TopDownCocoWholeBodyDataset',
**ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',**
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TopDownCocoWholeBodyDataset',
#ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
**ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',**
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
The script i used for testing is this . ./mmpose/tools/dist_test.sh ./FES_Results_COCO/hrnet_w348_coco_wholebody_388x288.py "/home/yukti/Downloads/hrnet_w48_coco_384x288-314c8528_20200708.pth" 1 --eval mAP
But i am getting siye mismatch error,
l**oad checkpoint from local path: /home/yukti/Downloads/hrnet_w48_coco_384x288-314c8528_20200708.pth The model and loaded state dict do not match exactly
size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([17, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([133, 48, 1, 1]). size mismatch for keypoint_head.final_layer.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([133]).**
Is this above things I am doing correct ?
Awaiting for your response.
load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'
This checkpoint seems like a model trained on COCO
dataset, but not on COCO-Wholebody
dataset. Please choose another appropriate checkpoint file. You can find one here.
Please find my changes in config.
I gave coco whole body only but i get this
base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None
data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
#ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TheodorePlusV2Dataset',
#ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
I get size mismatch error and AP = 0.0 The model and loaded state dict do not match exactly
size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([133, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([17, 48, 1, 1]). size mismatch for keypoint_head.final_layer.bias: copying a param with shape torch.Size([133]) from checkpoint, the shape in current model is torch.Size([17]). [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 736/736, 19.5 task/s, elapsed: 38s, ETA: 0sLoading and preparing results... DONE (t=0.03s) creating index... index created! Running per image evaluation... Evaluate annotation type keypoints DONE (t=0.12s). Accumulating evaluation results... DONE (t=0.01s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50 | area= all | maxDets= 20 ] = 0.001 Average Recall (AR) @[ IoU=0.75 | area= all | maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.025 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.000 AP: 2.3954008304056212e-05 AP (L): 0.0 AP (M): 0.0002155860747365059 AP .5: 7.984669434685404e-05 AP .75: 0.0 AR: 0.0004081632653061224 AR (L): 0.0 AR (M): 0.025 AR .5: 0.0013605442176870747 AR .75: 0.0
when i gave checkpoint for coco.It gave results with AP
I even tried testing resnet50 model and gave checkpoint for coco dataset only , it ran and gave results but when i run on with coco dataset its gives size mismatch error. I even checked the number of keypoints ..
Is there is a problem if we give check point for coco but in config its type is topdowncoco whole body .?? will there difference in results in AP ?
Could you please provide the config for your model? Seems like the model you are using will output 17 channels but the checkpoint you are using will output 133 channels.
size mismatch for keypoint_head.final_layer.weight: copying a param with shape torch.Size([133, 48, 1, 1]) from checkpoint, the shape in current model is torch.Size([17, 48, 1, 1]).
Please use the model that matches your expected output. If you would like to test on a dataset like COCO-Wholebody dataset (which has 133 keypoints thus need to output 133 channels), please use checkpoint like this:
load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth'
If you would like to test on a dataset like COCO dataset (which has 17 keypoints thus need to output 17 channels), please use checkpoint like this:
load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet/hrnet_w48_coco_384x288-314c8528_20200708.pth'
Please find the config below
base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None dist_params = dict(backend='nccl') workflow = [('train', 1)] checkpoint_config = dict(interval=10) evaluation = dict(interval=10, metric='mAP', save_best='AP')
optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)
lr_config = dict( policy='step', warmup=None,
# warmup_iters=500,
# warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210 log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),
])
channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ list(range(17)), ], inference_channel=list(range(17)))
model = dict( type='TopDown', pretrained='https://download.openmmlab.com/mmpose/' 'pretrain_models/hrnet_w48-8ef0771d.pth', backbone=dict( type='HRNet', in_channels=3, extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(48, 96)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384))), ), keypoint_head=dict( type='TopdownHeatmapSimpleHead', in_channels=48, out_channels=channel_cfg['num_output_channels'], num_deconv_layers=0, extra=dict(final_conv_kernel=1, ), loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), train_cfg=dict(), test_cfg=dict( flip_test=True, post_process='default', shift_heatmap=True, modulate_kernel=11))
data_cfg = dict( image_size=[288, 384], heatmap_size=[72, 96], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=False, det_bbox_thr=0.0,
#'COCO_val2017_detections_AP_H_56_person.json',
bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_bboxes_scenario1.json',
#bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_bboxes_scenario2.json',
)
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=3), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
test_pipeline = val_pipeline
data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
#ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TheodorePlusV2Dataset',
#ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
Also I would like to ask if lr = 5e-4 and i reduce it by a factor of 10 so lr = 5e-5 right ? which is best lr out of two ?
base = ['/home/yukti/mmpose/mmpose/configs/base/datasets/theodore.py'] log_level = 'INFO' load_from = 'https://download.openmmlab.com/mmpose/top_down/hrnet_w48_coco_wholebody_384x288-6e061c6a_20200922.pth' resume_from = None dist_params = dict(backend='nccl') workflow = [('train', 1)] checkpoint_config = dict(interval=10) evaluation = dict(interval=10, metric='mAP', save_best='AP')
optimizer = dict( type='Adam', lr=5e-4, ) optimizer_config = dict(grad_clip=None)
lr_config = dict( policy='step', warmup=None,
# warmup_iters=500,
# warmup_ratio=0.001,
step=[170, 200])
total_epochs = 210 log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),
])
channel_cfg = dict( num_output_channels=17, dataset_joints=17, dataset_channel=[ list(range(17)), ], inference_channel=list(range(17)))
model = dict( type='TopDown', pretrained='https://download.openmmlab.com/mmpose/' 'pretrain_models/hrnet_w48-8ef0771d.pth', backbone=dict( type='HRNet', in_channels=3, extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(48, 96)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(48, 96, 192)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(48, 96, 192, 384))), ), keypoint_head=dict( type='TopdownHeatmapSimpleHead', in_channels=48, out_channels=channel_cfg['num_output_channels'], num_deconv_layers=0, extra=dict(final_conv_kernel=1, ), loss_keypoint=dict(type='JointsMSELoss', use_target_weight=True)), train_cfg=dict(), test_cfg=dict( flip_test=True, post_process='default', shift_heatmap=True, modulate_kernel=11))
data_cfg = dict( image_size=[288, 384], heatmap_size=[72, 96], num_output_channels=channel_cfg['num_output_channels'], num_joints=channel_cfg['dataset_joints'], dataset_channel=channel_cfg['dataset_channel'], inference_channel=channel_cfg['inference_channel'], soft_nms=False, nms_thr=1.0, oks_thr=0.9, vis_thr=0.2, use_gt_bbox=False, det_bbox_thr=0.0,
#'COCO_val2017_detections_AP_H_56_person.json',
bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_bboxes_scenario1.json',
#bbox_file='/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_bboxes_scenario2.json',
)
train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownRandomFlip', flip_prob=0.5), dict( type='TopDownHalfBodyTransform', num_joints_half_body=8, prob_half_body=0.3), dict( type='TopDownGetRandomScaleRotation', rot_factor=40, scale_factor=0.5), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict(type='TopDownGenerateTarget', sigma=3), dict( type='Collect', keys=['img', 'target', 'target_weight'], meta_keys=[ 'image_file', 'joints_3d', 'joints_3d_visible', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
val_pipeline = [ dict(type='LoadImageFromFile'), dict(type='TopDownAffine'), dict(type='ToTensor'), dict( type='NormalizeTensor', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), dict( type='Collect', keys=['img'], meta_keys=[ 'image_file', 'center', 'scale', 'rotation', 'bbox_score', 'flip_pairs' ]), ]
test_pipeline = val_pipeline
data_root = 'TheodorePlusV2Dataset' data = dict( samples_per_gpu=32, workers_per_gpu=2, val_dataloader=dict(samples_per_gpu=32), test_dataloader=dict(samples_per_gpu=32), train=dict( type='TopDownCocoWholeBodyDataset', ann_file=f'{data_root}/annotations/coco_wholebody_train_v1.0.json', img_prefix=f'{data_root}/train2017/', data_cfg=data_cfg, pipeline=train_pipeline, dataset_info={{base.dataset_info}}), val=dict( type='TheodorePlusV2Dataset',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
#ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/old_annotations/person_keypoints_scenario2.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=val_pipeline,
dataset_info={{_base_.dataset_info}}),
test=dict(
type='TheodorePlusV2Dataset',
#ann_file=f'{data_root}/annotations/coco_wholebody_val_v1.0.json',
ann_file=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/coco_annotations_final_corrected_2022/person_keypoints_scenario1.json',
img_prefix=f'/mnt/dst_datasets/own_omni_dataset/FES_keypoints/scenario1/JPEGImages/',
#img_prefix=f'{data_root}/val2017/',
data_cfg=data_cfg,
pipeline=test_pipeline,
dataset_info={{_base_.dataset_info}}),
)
Also if lr = 5e-4 and i reduce by factor of 10 , the lr = 5e-05 ? which is best out of these 2 lr ?
Please change this parameter according to your dataset.
channel_cfg = dict( num_output_channels=17,
Also if lr = 5e-4 and i reduce by factor of 10 , the lr = 5e-05 ? which is best out of these 2 lr ?
I am afraid that these hyper parameters may need to be tuned on your own dataset.
Hallo Team,
I was training the hrnet model and trying to improve the accuracy of model since trained the model too many times and it may lead to overfitting.
I would like to know if there is a possibility to augment the data with random noise in MMPOSE?
Where to look into the code of mmpose and how we can do this ?
Please suggest !