visible_nodes[i][tot] = (idx * output_res * output_res + y * output_res + x, 1), initially I thought that the reference value should reflect the ground truth position of the keypoint in a flattened map, however, it seems that this is not true. This seems bit confusing as the predicted tag value from a flattened tag map is actually retrieved using that reference value so it makes me wonder why are reference tag values computed as per above? Why do those value not reflect the actual position of the keypoint? I was under the impression that during the computation of the loss you would want to retrieve predicted tag value at the exact location of where the keypoint is.
Hello,
Question about the generation of keypoints references for tag loss. In the https://github.com/princeton-vl/pose-ae-train/blob/master/data/coco_pose/dp.py#L43 in
__call__
function reference values for each tag map are computed asvisible_nodes[i][tot] = (idx * output_res * output_res + y * output_res + x, 1)
, initially I thought that the reference value should reflect the ground truth position of the keypoint in a flattened map, however, it seems that this is not true. This seems bit confusing as the predicted tag value from a flattened tag map is actually retrieved using that reference value so it makes me wonder why are reference tag values computed as per above? Why do those value not reflect the actual position of the keypoint? I was under the impression that during the computation of the loss you would want to retrieve predicted tag value at the exact location of where the keypoint is.Cheers,