sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
272 stars 137 forks source link

"Feature number mismatch error" when analyzing new videos with a machine model #351

Open xktz89 opened 3 months ago

xktz89 commented 3 months ago

Describe the bug When trying to use a machine model on new videos, I am met with the following error message:

SIMBA FEATURE NUMBER MISMATCH ERROR: Mismatch in the number of features in input file C:/Users/xktz/Desktop/DLC_SimBA_videos/simba/project_folder/csv/features_extracted/animal54_09_02_24_res60fps.csv, and what is expected by the model groom. The model expects 35 features. The data contains 31 features. 🚨

The same message appears when I attempt to validate model on a single video.

To Reproduce Steps to reproduce the behavior:

  1. Train machine model for specific behavior
  2. Import new videos and CSV files from DeepLabCut
  3. Configure video parameters
  4. Skip outlier correction
  5. Extract features
  6. Run machine model

Expected behavior After importing new videos and their CSVs from DeepLabCut, I fixed the video parameters, skipped outlier correction, and extracted features. For all steps, I used default settings and followed the guidelines on Github. I expect that I should be able to "Run Machine Model" using my newly trained behavior classifier named "groom."

The machine model for "groom" was trained using default settings for training a single model. During model training, the output was as follows: SIMBA COMPLETE: Hyper-parameter config saved (9 saved in project_folder/configs folder). 🚀 Reading in 14 annotated files... Dataset size: 74.592MB / 0.074592GB Number of features in dataset: 35 Number of groom frames in dataset: 556.0 (0.22%) Training and evaluating model... Fitting shake model... SIMBA COMPLETE: Classifier groom saved in models/generated_models directory (elapsed time: 1125.2436s) 🚀 SIMBA COMPLETE: Evaluation files are in models/generated_models/model_evaluations folders 🚀

Which indicates the number of features in the dataset is 35. What is the number of features referring to? Why do my new videos have only 31 features after extraction? I suspect I have misclicked something but was unable to find a solution in the documentation or on this forum. I had an earlier model working fine on new videos and was able to get summaries of grooming behavior in each video, but worry I have changed something by mistake.

Thank you very much in advance for any insights and help! This is all very new to me, so please let me know if I can provide additional information.

Desktop (please complete the following information):

Conda simbanenv

sronilsson commented 3 months ago

Hi @xktz89! If you search the SimBA GitHub issues or the SimBA gitter channel for keyword e.g., “mismatch” - you will find discussions on this issue here and there.

I’ll describe what is happening:

When you trained the grooming model, SimBA grabbed all the CSV files inside the project_folder/csv/targets_inserted directory. Each of these files contain three “types” of columns: the first columns will be your body-part locations with x,y, and p-values. The very last columns will be your hand annotations, e.g., groom columns with zeros and ones. Everything column in between is your “features”: they are values representing the movement of your animals. SimBA will use these in-between columns to build your machine learning model.

Before training your groom model, SimBA looks at the body-part names of your project, and the names of your classifiers in your project, and removes these columns so we only have features left. If you look inside the project_folder/csv/targets_inserted directory and open one of the CSV files, you should see 35 of these in-between columns.

Next, now you have new video files that you want to run the classifier on. These files are located in the project_folder/csv/features_extracted directory. SimBA opens the each one of these CSV files, beginning with animal54_09_02_24_res60fps.csv and removes the body-part columns and keeps the feature columns (we don't have any annotations to remove this time). It also opens the classifier that you previously trained. Before doing anything, SimBA checks that you have the same number of columns left in the animal54_09_02_24_res60fps.csv that you had for the videos that you built the classifier on, and at this time it complains - you do not, you only have 31 columns, and the classifier expects 35! You have 4 fewer columns that SimBA doesn’t know what to do with that.

So where did these 4 extra columns come from and what are there? Did the project_folder/csv/targets_inserted files contain any extra annotations columns (additional behaviors) that you failed to declare where annotations? Did any body-parts get added/removed to your project between training and the new videos?

You can always share one file each from project_folder/csv/targets_inserted and project_folder/csv/features_extracted and I can take a look and help with the digging.

xktz89 commented 3 months ago

Hi @sronilsson! Thank you so much for your thorough reply! This was really helpful and will be good to keep in mind going forward.

You are correct, I previously labeled additional behaviors that were later removed from the project. I managed to fix the problem by creating a new project for a single behavior classifier (groom) then manually copying the column of hand annotations for groom from my old /targets_inserted and pasting them into the new /targets_inserted.

Thanks again!

sronilsson commented 3 months ago

Excellent, thanks for letting us know! There should be some scripts laying about for removing or moving columns between files in different directories, let me know if it happens again, if you have a lot of files to process it can become a chore to move manually.,