pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

Keypoint RCNN visibility flag for keypoints #5872

Open mbadal1996 opened 2 years ago

mbadal1996 commented 2 years ago

🚀 The feature

Hello All,

This is only my first day posting a request here so I apologize for any errors on my part. Also, sorry for the long post below.

The purpose of this post is to request an improvement/correction for the visibility flag behavior of Keypoint RCNN. Based on my results and those of other users I have encountered on different forums and sites, Keypoint RCNN always predicts a flag value of v=1 for all keypoints, no matter the training flag value for v>0 (even v=0), and predicts coordinates for them as well. In other words, the model does not appear to actually learn the flag value. My understanding is that the flag should be learned and is supposed to follow the COCO convention (v=0 ‘not in image’; v=1 ‘occluded’; v=2 ‘visible’) but does not do so.

Motivation, pitch

Given the usefulness of the visibility flags, being able to accurately predict them and use the information during inference to mark occluded vs. visible keypoints would be an important addition to the model capability. My understanding is that this is already supposed to be the case, but for some reason the documentation as well as the model behavior on this are lacking. I have found the performance of Keypoint RCNN overall to be very good and I have successfully fine-tuned it on my custom (multiclass) dataset with very good success in predicting the class, bbox, and keypoints. It would be very helpful to be able to distinguish between keypoints using visibility flag.

Alternatives

No response

Additional context

My hope in writing here is to request and encourage updating of the model to address the issue/addition suggested. If not, then if I could please get some help in tracking down the source code where Keypoint RCNN is converting all flags to v=1 and handling/training flags so that I might be able to modify this behavior, as the model does not seem to learn the flag values presently. In my use case, what I want is for Keypoint RCNN to successfully predict the right flag (e.g. v=0) so that I can use it later on, or at least predict a coordinate of (0.0,0.0) (or some other fixed value) for keypoints with v=0. The need is to be able to distinguish between visible and occluded keypoints. Even just two learned flags that work as expected (v=0 and v=1) would be very useful to have. Any suggestions or guidance would be great. Thanks for taking the time to reply.

cc @datumbox @YosuaMichael

datumbox commented 2 years ago

@mbadal1996 Thanks for raising this. Welcome to our community!

I haven't worked a lot on the specific model but I confirm that your assessment is correct. The KeypointRCNN model doesn't produce predictions for the visibility flag and during inference it's always set to 1. This is because all keypoints with v=0 are excluded from training.

I'll leave @fmassa provide additional context, but my understanding is that the choice of ignoring keypoints that are not visible is quite deliberate on the implementation of TorchVision.

mbadal1996 commented 2 years ago

Hi @datumbox and @fmassa,

Thanks very much for the friendly welcome. I appreciate that you are getting back to me on this question. Thanks also for confirming the behavior and providing some more insight. Yes, my experiments appear to confirm the comment that keypoints with v=0 are excluded from training (so excluded from the loss calculation). After training, the model goes on to predict on those (let's say occluded) keypoints with poor accuracy during inference, which is natural, but produces a flag of v=1. It would be wonderful if the model could instead, either predict a flag of v=0 (which would imply some sort of learning, at least for the flag), or predict a coordinate of (0.0,0.0) or some other fixed value for such keypoints. In the end, I am just hoping for the model to give some signal that a particular predicted keypoint is occluded, which would allow me to avoid drawing it or using it. In my use case, occluded keypoints are essentially to be ignored since they do not provide helpful information, but I need to have some way to predict that a given keypoint is indeed ignorable. If I can please request that your team implement this yourself, or guide me to implement this it would be great. Thank you for pointing out where in the source code some of this is happening. I have seen several other places on the web, which have also come across this behavior, so I think it would help others as well. In fact, perhaps any fix implemented could potentially be extended to Detectron2, which I believe has the same sort of behavior, though I do not have first-hand experience with it. Thank you so much for considering my request. I appreciate your time and look forward to hearing from you.

Sincerely, mbadal1996

Lokesh-26 commented 2 weeks ago

Hi @mbadal1996, @datumbox and @fmassa,

Could you Please advise if there is any update regarding the visibility flag for the keypoints in the KeypointRCNN model? It would be greatly beneficial if the incorrect predicted keypoints were set to visibility flag 0.