paninski-lab / lightning-pose

Accelerated pose estimation and tracking using semi-supervised convolutional networks.
MIT License
238 stars 35 forks source link

over confident pseudo tracking when animal exits field of view #183

Open DavidGill159 opened 3 months ago

DavidGill159 commented 3 months ago

Hi, I have trained multiple LP models now; single camera supervised, semi-supervised, ctx, multi-camera supervised, and multi-camera ctx. One issue that persists across all models is LP's insistence on tracking landmarks which are not occluded but completely out of the field of view. This is something Deeplabcut deals with quite well by giving low-confidence values to landmarks in these frames; such low-confidence landmarks are then not plotted in video inference. This does not seem to be reflected in LP. How do you propose I go about improving this? This issue poses problems when triangulating the landmarks across different camera views. E.g., In the attached frame (during rearing), the snout likelihood for DLC is 0.003842 and for LP it is 0.906632. image image

danbider commented 3 months ago

@DavidGill159 -- the way we calculate the confidence score is different from DeepLabCut, and our scores will by definition be higher. Using our scaling, it is advised to use 0.95 to threshold low confidence keypoints. Can you please try this and tell us if it helps?

DavidGill159 commented 3 months ago

Hi Dan thanks for getting back to me! I just tried that, but I'm afraid the video prediction still looks the same.

themattinthehatt commented 3 months ago

@DavidGill159 this is an issue that we're aware of and trying out several fixes. In the meantime, a couple points to keep in mind:

themattinthehatt commented 1 week ago

@DavidGill159 wanted to check back in on this - we have a solution that may or may not work for you depending on the nature of your labeled data; see the FAQ "Why does the network produce high confidence values..."

This requires some labeled frames where certain keypoints are occluded so that the network can learn to handle them properly. This approach definitely works in the supervised case, though we have not thoroughly tested what happens when using unsupervised losses. In that case, occluded keypoints in the video data create a tension between the network attempting to output a uniform heatmap to indicate occlusion, and the unsupervised loss trying to localize the keypoint in order to minimize the PCA loss(es).

If you end up giving this a try please report back and let me know how it works! @YitingChang this might also be something you're interested in