sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
299 stars 145 forks source link

Poor classification results despite copying annotations from successfully trained models #401

Open xktz89 opened 4 hours ago

xktz89 commented 4 hours ago

Describe the bug I am training several models to detect grooming behavior. I trained three different DeepLabCut models on the same set of videos of at low, medium, or high resolution, then analyzed the same set of experimental videos (at low, medium, or high resolutions) to train three separate SimBA models. I annotated the behaviors for low resolution videos and successfully trained a model with good precision, recall, and f-1 score. I then created a new project for the medium resolution videos and went through the same steps as before for training (setting new video parameters, skipping outlier correction, extracting features) and then manually copied my previous annotations from the targets_inserted .csv files from the low resolution videos (see Steps below). This classifier also had good precision, recall, and f-1. However, I have tried the same procedure for the same high resolution videos, and my classification report looks very strange.

Not_grooming Precision: 0.999233 Recall: 1 f1-score: 0.999616 Support: 242179

grooming Precision: 0 Recall: 0 f1-score: 0 Support: 186

I already checked labeled videos from DeepLabCut and confirmed that the high-resolution DLC model is correctly tracking body parts. I also made sure I am importing the correct high-resolution videos and tracking files. The previous classifiers had the same balance of of grooming vs not grooming examples. I have used identical settings and training parameters. I have attempted to train a network again, and still get 0 precision, recall, and f-1. I am unsure what parameters to modify to improve performance, or frankly why this classifier is performing so poorly despite using the exact same annotations and only improving the video resolution. What might account for this?

To Reproduce Steps to reproduce the behavior:

  1. Trained classifier to detect grooming.
  2. Created new project using same videos but at higher resolution (imported higher resolution video files AND tracking files from DLC model trained on higher resolution videos).
  3. Set video parameters to new distance (since there are more pixels and distance is different than in the same lower resolution videos).
  4. Skipped outlier correction and extracted features (same as with low-resolution classifier).
  5. Created new video annotations for low-resolution videos using "Label Behavior" tab, then manually pasted "groom" column with annotations from low-resolution "targets_inserted" .csv files into new project's "targets_inserted" .csv files.
  6. Trained classifier at default settings, and created classification report, feature importance bar graph, and calculated SHAP scores for 200 targets present and 200 targets absent.

Expected behavior I would expect good or at least comparable classification performance as the lower resolution models.

Desktop (please complete the following information):

Additional context Let me know if this all makes sense. Thank you in advance for any help or insights as to why this may be happening!

sronilsson commented 4 hours ago

Hi @xktz89!

Sounds like you have thoroughly investigated this and I don't have an immediate answer. One thing that looks suspicious:

Not_grooming Precision: 0.999233 Recall: 1 f1-score: 0.999616 Support: 242179

grooming Precision: 0 Recall: 0 f1-score: 0 Support: 186

This table suggests that the test set contains 186 grooming frames, and more than 242k non-grooming frames, meaning that something like 0.0007% of frames contain grooming in the project_folder/csv/targets_inserted files which is very little data. Could anything have gone wrong when you copied and pasted the data?

If not huge amounts of data, you could you share the project with me and I can take a look? (it can help if you omit most of the video files which can take up a lot of space).