Closed amin-asdzdh closed 4 years ago
Hi!
The dataset class can be an implementation of the generic HumanPoseEstimationDataset
class.
You can check the COCODataset
to have a guide while creating your custom dataset.
Basically, if you want to use the train_coco.py
script, you need a __getitem__
method that returns a tuple containing
image, target, target_weight, joints_data
and a evaluate_accuracy
method that returns accs, avg_acc, cnt, joints_preds, joints_target
.
However, I suggest you to start from the dataset and the training script of COCO and remove the unneeded parts. Let me know if you encounter any difficulties.
Thanks for the reply, I'll try that.
From my understanding target_weight
should be a binary list, defining the visibility of the joints? Could you please tell me how joints_data
is different from target
?
Thanks again
target_weight
should be a binary list, defining the visibility of the joints
It depends. In the simplest case it is a binary visibility list, but you can also use it differently.
For instance, for the training on COCO the binary joint_visibility can multiplied by a per-joint factor to increase/decrease the importance of some joints (see self.joints_weight
in COCODataset
).
Could you please tell me how
joints_data
is different fromtarget
?
Sure!
target
are the joint heatmaps, while the joint coordinates, the joint visibility and other metadata can be saved in joints_data
(which is a dictionary).
Thanks for the explanation, much appreciated.
Could you explain what is going on in this part of the COCO.py in COCOTrain class?
Does this relate to bounding boxes of people in the image?
Thank you
Could you please explain why joints
and joints_vis
in the _generate_target
method have 3 columns?
Could you please explain why
joints
andjoints_vis
in the_generate_target
method have 3 columns?
This is a typo. The correct value is [nof_joints, 2]
Thank you for spotting it!
Thanks for the explanation, much appreciated.
Could you explain what is going on in this part of the COCO.py in COCOTrain class?
Does this relate to bounding boxes of people in the image?
Thank you
As far as I remember, this is a specific code for the COCO dataset.
The get_final_preds
function retrieves the joint positions from the heatmaps (using the function get_max_preds
) then undoes some pre-processing applied on the input image to convert the prediction back to the original COCO format.
I think it is related to the differences between the original size of the bounding box, the input size and the output size.
The all_preds
and all_boxes
arrays are used to store these results and then compare them with the COCO annotations with the method evaluate_overall_accuracy
.
Thanks for the help, much appreciated!
Thanks for the great work.
I want to train the network on a custom dataset to predict 7 keypoints. Any guidelines on how I can prepare the dataset class for this task?
And, other than changing the number of output channels in the last layer, is there anything else that needs to be modified?
Many thanks