stefanopini / simple-HRNet

Multi-person Human Pose Estimation with HRNet in Pytorch
GNU General Public License v3.0
571 stars 106 forks source link

Training on custom datasets #56

Closed amin-asdzdh closed 4 years ago

amin-asdzdh commented 4 years ago

Thanks for the great work.

I want to train the network on a custom dataset to predict 7 keypoints. Any guidelines on how I can prepare the dataset class for this task?

And, other than changing the number of output channels in the last layer, is there anything else that needs to be modified?

Many thanks

stefanopini commented 4 years ago

Hi!

The dataset class can be an implementation of the generic HumanPoseEstimationDataset class. You can check the COCODataset to have a guide while creating your custom dataset.

Basically, if you want to use the train_coco.py script, you need a __getitem__ method that returns a tuple containing image, target, target_weight, joints_data and a evaluate_accuracy method that returns accs, avg_acc, cnt, joints_preds, joints_target.

However, I suggest you to start from the dataset and the training script of COCO and remove the unneeded parts. Let me know if you encounter any difficulties.

amin-asdzdh commented 4 years ago

Thanks for the reply, I'll try that.

From my understanding target_weightshould be a binary list, defining the visibility of the joints? Could you please tell me how joints_datais different from target?

Thanks again

stefanopini commented 4 years ago

target_weight should be a binary list, defining the visibility of the joints

It depends. In the simplest case it is a binary visibility list, but you can also use it differently. For instance, for the training on COCO the binary joint_visibility can multiplied by a per-joint factor to increase/decrease the importance of some joints (see self.joints_weight in COCODataset).

Could you please tell me how joints_data is different from target?

Sure! target are the joint heatmaps, while the joint coordinates, the joint visibility and other metadata can be saved in joints_data (which is a dictionary).

amin-asdzdh commented 4 years ago

Thanks for the explanation, much appreciated.

Could you explain what is going on in this part of the COCO.py in COCOTrain class?

https://github.com/stefanopini/simple-HRNet/blob/12d8a76cea5c7fe2651894461c37b4830560a5fb/training/COCO.py#L154-L170

Does this relate to bounding boxes of people in the image?

Thank you

amin-asdzdh commented 4 years ago

Could you please explain why jointsand joints_vis in the _generate_target method have 3 columns?

https://github.com/stefanopini/simple-HRNet/blob/12d8a76cea5c7fe2651894461c37b4830560a5fb/datasets/COCO.py#L460-L465

stefanopini commented 4 years ago

Could you please explain why jointsand joints_vis in the _generate_target method have 3 columns?

https://github.com/stefanopini/simple-HRNet/blob/12d8a76cea5c7fe2651894461c37b4830560a5fb/datasets/COCO.py#L460-L465

This is a typo. The correct value is [nof_joints, 2] Thank you for spotting it!

Thanks for the explanation, much appreciated.

Could you explain what is going on in this part of the COCO.py in COCOTrain class?

https://github.com/stefanopini/simple-HRNet/blob/12d8a76cea5c7fe2651894461c37b4830560a5fb/training/COCO.py#L154-L170

Does this relate to bounding boxes of people in the image?

Thank you

As far as I remember, this is a specific code for the COCO dataset. The get_final_preds function retrieves the joint positions from the heatmaps (using the function get_max_preds) then undoes some pre-processing applied on the input image to convert the prediction back to the original COCO format. I think it is related to the differences between the original size of the bounding box, the input size and the output size. The all_preds and all_boxes arrays are used to store these results and then compare them with the COCO annotations with the method evaluate_overall_accuracy.

amin-asdzdh commented 4 years ago

Thanks for the help, much appreciated!