The result I get from valid.py is different from what I get from cocoapi.

ArchNew commented 6 years ago

I did the validation via valid.py and put the json file generated in the process through the cocoapi. These two gave rather different results. The cocoapi's result is more than 20% lower than valid.py's. Do you have different coordinates system? If so, could you so kindly tell us the conversion formula between your system and cocoapi's? Thanks!

leoxiaobin commented 6 years ago

the system's results are also from cocoapi

ArchNew commented 6 years ago

I know, that's why I'm confused. I use valid.py, the result is: (on my own datasets)

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.687
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.876
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.754
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.410
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.725
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 20 ] = 0.714
 Average Recall     (AR) @[ IoU=0.50      | area=   all | maxDets= 20 ] = 0.885
 Average Recall     (AR) @[ IoU=0.75      | area=   all | maxDets= 20 ] = 0.773
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets= 20 ] = 0.444
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets= 20 ] = 0.753

and I took out the json generated here, put it through cocoapi, the result is:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.509
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.914
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.572
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.031
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.375
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.093
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.557
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.053
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.443
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.575

I took a peek at the result, addition to image_id, category_id, keypoints and score, the file also has 'center' and 'scale'. Should I use the 'center' and 'scale' to perform some kind of transformation in order to have the results valid.py gave out? Or it's just the difference between maxDets?

leoxiaobin commented 6 years ago

For keypoint detection, the maxDets is 20, why your result using cocoapi is 100?

ArchNew commented 6 years ago

Thanks! I'm a rookie in research, still have got a lot of things to learn.

cs-heibao commented 5 years ago

@ArchNew hi, I've also used valid.py on my own datasets, but the result seems abnormal:

Average Precision (AP) @[ IoU=0.50:0.95	area= all	maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50	area= all	maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.75	area= all	maxDets= 20 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95	area=medium	maxDets= 20 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95	area= large	maxDets= 20 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95	area= all	maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50	area= all	maxDets= 20 ] = 0.004 Average Recall (AR) @[ IoU=0.75	area= all	maxDets= 20 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95	area=medium	maxDets= 20 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95	area= large	maxDets= 20 ] = -1.000 => coco eval results saved to output/coco/pose_resnet_50/256x192_d256x3_adam_lr1e-3/results/keypoints_val2018_results.pkl	Arch	AP	Ap .5	AP .75	AP (M)	AP (L)	AR	AR .5	AR .75	AR (M)	AR (L)
256x192_pose_resnet_50_d256d256d256	0.000	0.000	0.000	-1.000	-1.000	0.000	0.004	0.000	-1.000	-1.000

I keep the dataset the same format with coco dataset, except segmentation and area(they are set to None). But the validation result is wrong, so I doubt whether the data format is exist problem? How about yours? thanks

cs-heibao commented 5 years ago

I've found the bug~

MikeEnvi commented 4 years ago

I also have meet this problem.Do you solved this problem? Please help me. thank you

ArchNew commented 4 years ago

I suspect your understanding of the coco keypoints evaluation standards OKS is not sufficient. The "area" of the segmentation plays a key role in the evaluation. If you set the "area" to 0, the results would be wrong. For simple baselines, there's no segmentation in posetrack dataset, their solution is to use bounding box area in place of segmentation area. If you do evaluation by cocoapi, it's highly unrecommanded.

simonJJJ commented 4 years ago

I've found the bug~

Hi, what's the bug?

jfpalngipang commented 3 years ago

I've found the bug~

@cs-heibao, I have been trying to make this work but to no avail. May I know what you fixed? It would really help.

microsoft / human-pose-estimation.pytorch

The result I get from valid.py is different from what I get from cocoapi. #30