princeton-vl / pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"
BSD 3-Clause "New" or "Revised" License
373 stars 76 forks source link

The tags I got has vastly different value from the paper. How can I get the tags' value in the paper? #34

Open ArchNew opened 5 years ago

ArchNew commented 5 years ago

It's just my little experiment on the source code. I might get it all wrong in generating the tags' values. Please point out my errors. Thanks!!! I experimented on the image has the image_id 262145, namely, the one from the paper, Figure 4. The model I'm using is download from the link given by this project. I output the information of tags from multiperson function in test.py. In the function, it does flip enhancement. I added two tags values (one from the picture, one from the flipped picture). I output all the values of ground truth key points:

[1.260961, 1.3058516, 0, 1.2563938, 0, 1.2727276, 1.3023847, 1.2349962, 1.2527169, 1.2959831, 1.2270368, 1.3404604, 1.7944227, 1.3047302, 1.3632604, 1.2319796, 1.2049041]
[0.5672045, 0.5553081, 0, 0.5342066, 0, 0.62853205, 0.57086825, 0.6060959, 0, 0, 0, 0.44562602, 0.5234276, 0.58281267, 0.5969374, 0.3715278, 0]
[0, 0, 0, 1.2847638, 1.2794614, 1.3218927, 1.3253775, 1.390451, 1.4409075, 0, 0, 1.4085245, 1.4261274, 0, 0, 0, 0]
[0, 0, 0, 1.216414, 1.2051077, 1.1880434, 1.214577, 1.2288826, 1.2320774, 0, 0, 1.1940722, 1.2197711, 0, 0, 0, 0]
[0.71830493, 0.77619904, 0, 0.69648445, 0, 0.8377314, 0, 0.7768618, 0, 0.83856195, 0, 0.85833573, 0, 0, 0, 0, 0]
[0.29368687, 0.29288244, 0.2850952, 0, 0.3033538, 0.35456848, 0.31444836, 0, 0.27472353, 0, 0.32352066, 0.3529215, 0.31695318, 0, 0.2855525, 0, 0]
[0.38019133, 0.39934206, 0, 0.32967758, 0, 0.3586445, 0.40299988, 0.2993703, 0.304348, 0.5085192, 0.27281237, 0.3268919, 0, 0.3919878, 0, 0.38223648, 0]
[0, 0, 0, 0.5740683, 0.57219386, 0.5397749, 0.58586, 0.5749419, 0.58152056, 0.4648273, 0.2812636, 0.5862839, 0.55253434, 0, 0, 0, 0]

There are 8 people annotated with keypoints. 0 means that key point information is not available. Clearly the tags' values from the same person vary less than 0.1 for the most key points. The max variation is less than 0.3. I can see why every person's key points can fall into a almost straight line. Unfortunately, the difference between two people can also be less than 0.1. This means, not all people can be differentiated by the tags' values. At least, in the data I generated, there could be two people falling into the same line. The most confusing part is, tags' values I generated here is between 0.2 and 1.8, not -6 to 10 as the Figure.4 in the paper. Where I got it wrong?

genius9527 commented 5 years ago

@ArchNew How do you output the tag?