Open DavidGill159 opened 3 months ago
@DavidGill159 -- the way we calculate the confidence score is different from DeepLabCut, and our scores will by definition be higher. Using our scaling, it is advised to use 0.95
to threshold low confidence keypoints. Can you please try this and tell us if it helps?
Hi Dan thanks for getting back to me! I just tried that, but I'm afraid the video prediction still looks the same.
@DavidGill159 this is an issue that we're aware of and trying out several fixes. In the meantime, a couple points to keep in mind:
video_preds/<video_name>_pca_singleview_error.csv
in your model folder). this error should be very high for a frame like the one you've shared, indicating that there is an issue with the prediction@DavidGill159 wanted to check back in on this - we have a solution that may or may not work for you depending on the nature of your labeled data; see the FAQ "Why does the network produce high confidence values..."
This requires some labeled frames where certain keypoints are occluded so that the network can learn to handle them properly. This approach definitely works in the supervised case, though we have not thoroughly tested what happens when using unsupervised losses. In that case, occluded keypoints in the video data create a tension between the network attempting to output a uniform heatmap to indicate occlusion, and the unsupervised loss trying to localize the keypoint in order to minimize the PCA loss(es).
If you end up giving this a try please report back and let me know how it works! @YitingChang this might also be something you're interested in
Hi, I have trained multiple LP models now; single camera supervised, semi-supervised, ctx, multi-camera supervised, and multi-camera ctx. One issue that persists across all models is LP's insistence on tracking landmarks which are not occluded but completely out of the field of view. This is something Deeplabcut deals with quite well by giving low-confidence values to landmarks in these frames; such low-confidence landmarks are then not plotted in video inference. This does not seem to be reflected in LP. How do you propose I go about improving this? This issue poses problems when triangulating the landmarks across different camera views. E.g., In the attached frame (during rearing), the snout likelihood for DLC is 0.003842 and for LP it is 0.906632.