princeton-vl / pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
BSD 3-Clause "New" or "Revised" License
373 stars 76 forks source link

Generating ground truth Tag References #55

Open ADulian opened 2 years ago

ADulian commented 2 years ago

Hello,

Question about the generation of keypoints references for tag loss. In the https://github.com/princeton-vl/pose-ae-train/blob/master/data/coco_pose/dp.py#L43 in __call__ function reference values for each tag map are computed as

visible_nodes[i][tot] = (idx * output_res * output_res + y * output_res + x, 1), initially I thought that the reference value should reflect the ground truth position of the keypoint in a flattened map, however, it seems that this is not true. This seems bit confusing as the predicted tag value from a flattened tag map is actually retrieved using that reference value so it makes me wonder why are reference tag values computed as per above? Why do those value not reflect the actual position of the keypoint? I was under the impression that during the computation of the loss you would want to retrieve predicted tag value at the exact location of where the keypoint is.

Cheers,