sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
279 stars 138 forks source link

Issue while analyzing machine-results #206

Closed n-zeng closed 1 year ago

n-zeng commented 1 year ago

Describe the bug When analyzing the machine-results CSV's (C:\Users\name\Desktop\name\project_folder\csv\machine_results), I kept seeing all 0's in the column where it should be either a 0 or 1 based on whether the behavior is actually occurring (which I believe are derived from targets-inserted / Boris file ethograms). I'm also getting unexpectedly low results in the behavior probability columns.

To Reproduce

I was testing a previous and functional model on new data, so tell me if I did something incorrectly.

Steps to reproduce the behavior:

  1. Going through Simba normally until 'Run Machine Model'
  2. In model settings, I used .sav files from a different project folder (I tried choosing the file directly from the other folder, then I tried copying it over to the current folder both without success)
  3. Opening up the machine-results CSV files (path above), and scroll to the data at the very right.
  4. See error

Expected behavior I was expecting to see the 0/1 column to match the data found in my ethograms, and for the probabilities to match that.

Additional context Unfortunately Github isn't letting me upload a screenshot right now, please let me know if anything needs to be clarified.

sronilsson commented 1 year ago

Hi n-zeng! If I understand correctly, your machine learning model is severely under classifying the presence of your behavior: every frame is scored as behavior absent. The probabilities that the behavior is present are also very low for all frames.

One way this could happen would be if you fed a lot of annotations with behavior absent into your classifier, together with a very small number of behavior present annotations. How many annotated frames with the behavior-present vs behavior-absent did you use to create the model?

If the annotations are imbalanced towards behavior-absent, this would bias the model towards classifying most frames as behavior absent. One way around this is to balance your annotations, taking an equal or similar amount of behavior-absent and behavior-present frames when you create your classifier – check out the Random undersampling in the machine model settings for how to balance the data.