Reproducing results without GT Keypoints

levan92 commented 2 years ago

Hi @liruilong940607, we are trying to reproduce the results of Pose2Seg.

For the test results without using GT keypoints (2nd row of Table 2a and 2b on your paper), i.e., using another pose estimator to predict the keypoints instead, I understand that you use Associative Embedding: https://github.com/princeton-vl/pose-ae-train to generate the predicted keypoints first.

Are you able to share the same json file that contains the predicted keypoints from AE so we can reproduce the results?

Thank you!

liruilong940607 commented 2 years ago

Hi unfortunately I lost the track of those files. We used this repo to predict the keypoints:

https://github.com/princeton-vl/pose-ae-demo

I remember we did a few modifications in this function:

https://github.com/princeton-vl/pose-ae-demo/blob/1510593da9d51553fc35fe8bf5531be511014eb1/main.py#L105

But unfortunately I couldn't recall the details.

levan92 commented 2 years ago

I see, okay let me try. I'll let you know again if I need more clarifications. Thank you!

levan92 commented 2 years ago

@liruilong940607 I refer to your previous reply back in 2019: https://github.com/liruilong940607/Pose2Seg/issues/13#issuecomment-506909473 Can I confirm that you use Pose-AE's predictions without using their refine function, which is used in Line 128?

liruilong940607 commented 2 years ago

I think so -- I can't remember the exactly changes I made but their refine function seems like something I would avoid using.

levan92 commented 2 years ago

Got it, thank you for your prompt replies @liruilong940607!

Just to confirm, the steps for evaluation without GT keypoints will be: 1) Run ae-pose on OCHuman to get predicted keypoints 2) Convert predicted keypoints to coco json format ('segmentation', 'bboxes' fields will be blank) 3) Run Pose2Seg's test.py with the predicted keypoints (converted to coco json format) 4) Run output predicted segmentation masks against OCHuman val/test groundtruth jsons via cocoapi to get evaluation scores

levan92 commented 2 years ago

Can I also confirm if you did multi-scale evaluation for the pose estimation step via AE?

levan92 commented 2 years ago

Using the pre-trained model provided in this repo, I am still not able to reproduce the 22.2/23.8AP on OCHuman val/test for Pose2Seg without GT kpts in the Pose2Seg paper. Best I'm getting is 15.0 AP for OCHuman val. I am using the same ae-pose model to get the predicted keypoints, and I optimise over varying score thresholds of the keypoints predictions, before feeding the predictions into the Pose2Seg model.

There shouldn't be a problem with the pose predictions by ae-pose, because with these predictions, I am getting good keypoint task scores of 32.4/32.1 AP on OCHuman val/test, which is higher than the 28.5/30.3AP reported in section 5.1 of the Pose2Seg paper.

You can find the pose prediction I generated here: OCHuman Val & Test.

Here's the code I use to convert the pose prediction json files to coco json format: code

Modified test.py that take in the predicted keypoints to output pickle file containing the predicted segmentation masks: code

Modified datasets/CocoDatasetInfo.py slightly, such that it is able to take in empty segm/bbox without throwing errors, and only process the keypoints: code.

eval.py to take in the pickle file containing the predicted segmentation mask to evaluate against OCHuman GTs: code

I did do a "sanity check" as well, by placing OCHuman GT keypoints as "predictions" through the conversion script, and then through the modified test.py and eval.py pipeline, and the final scores that came out same as the results if tested with the old test.py with OCHuman val GT coco json. So I think the conversion and test/eval logic should be sound. The only part that i'm not too sure is the handling of the confidence scores for the pose predictions, I've tested through different score thresholds, but the max AP I get of 15.0 is still far from the one reported in the paper of 22.2 AP.

Please help to see if you know what went wrong. Thank you!

levan92 commented 2 years ago

@liruilong940607 I reference your reply here: https://github.com/liruilong940607/OCHumanApi/issues/2#issuecomment-480785001. I'm trying to understand why you say that it does not affect evaluation.

It seems like there are some images in the val/test set that is not exhaustively labelled. Meaning, a pose estimation model might correctly predict keypoints, but those keypoints & corresponding segm mask might not appear in the GT labels, and therefore the network will be penalised for predicting these segmentation masks as False Positives, even though they actually have corresponding persons on the image.

liruilong940607 commented 2 years ago

During evaluation we only evaluate images that has everything annotated. See here

For the replication. I suspect the "better" KPT results you got (32.4/32.1 AP v.s. 28.5/30.3AP in the paper) might be the reason. The same reason we had to disable the refinement process from pose-ae-train repo -- There are some tricks that makes the kpt score higher but actually introduce wrong skeletons / kpts, for example filling in the missing skeletons with some random skeletons is not penalized by coco eval metric at all (and overall gives you better kpt score).

Again I'm sorry that I can't recall the details on what I changed in that repo. I think the main gap is there.

levan92 commented 2 years ago

Okay, yea I had that hunch too. I evaluated on a subset of the images that has everything annotated and results are close to the paper now. Thank you!

liruilong940607 / Pose2Seg

Reproducing results without GT Keypoints #47