sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
272 stars 137 forks source link

Using Random forest behavioral classifiers from OSF repository #367

Open wallawhitecoat opened 2 weeks ago

wallawhitecoat commented 2 weeks ago

Hi, I'm trying to run resident intruder classifiers on one imported video, ten mins long, 60fps, and over 36,000 frames. I've been able to successfully import my H5 tracking data from SLEAP. My skeleton is SLEAP based, 11 body parts per animal. User Defined Skeleton

I have four questions. 1. Is it possible to run the following behavioral classifiers (anogenital_sniff, attack, lateral_threat, pursuit, & tail_rattle) I found on the Random forest behavioral classifiers OSF repository on my data if the body-part config is user defined? If yes, am I correct in assuming that I can use PSEUDO-Labelling for this? https://osf.io/3mc7g/

If the answer to this is no, I am fine with remedying this by deleting unnecessary nodes.

2. Is this the feature extractor (from SimBA Github) that I should be using, it seems to freeze once I use it? [feature_extractor_user_defined.zip] (https://github.com/user-attachments/files/15779886/feature_extractor_user_defined.zip) image

3. Downloading CSV files from SLEAP puts both tracks (for resident and intruder) in a single column, it seems that to train the model, SimBA wants each node annotated in individual columns (example "Resident_forelegL1_1_y"). I would like to know how to remedy this. Example Below: How SLEAP CSV files are exported and labeled How SLEAP CSV files are exported and labeled

  1. Should I be using this meta data file? BtWGaNP_meta.csv https://github.com/sgoldenlab/simba/blob/master/misc/BtWGaNP_meta.csv

Here is my Outlier correction settings in case it helps: image

Any help is appreciated, thank you!

Windows 10, version 22H2 Python Version [3.10.14] using miniconda3 Version [24.4.0]

sronilsson commented 2 weeks ago

Hi @wallawhitecoat!

  1. To use the models on OSF, you need to track the same body-parts that were used to create the model. This ensures that SimBA computes the same features that the models expects. If you have different body-parts configurations, SimBA will compute different features and a different number of features, and the models will be confused what to do with the additional / missing features and throw you an error. To use these models, use this setting when creating your project:
Screenshot 2024-06-11 at 8 51 24 AM
  1. One possibility that it “freezes”, is that the example script is expecting data columns that don’t exist in your data, and it errors out. If it errors out in a way that I haven’t yet been able to anticipate, no errors will be printed in the main SimBA interface. Instead, an error message from the standard python library will be printed in the main terminal which you used to launch SimBA. If you look in the terminal that you used to launch SimBA, do you see any error msg printed?

  2. On the difference between the SLEAP data, and the format in SimBA where the SLEAP data is transposed. When you import your data into SimBA, the data lands in the project_folder/csv/input_csv within your SimBA project. After you perform outlier correction (or skip to perform outlier correction), the corrected data is copied to the project_folder/csv/outlier_corrected_location and project_folder/csv/outlier_corrected_movement_location directories of your SimBA project with the additional corrected headers appended. If you want the transposed data, you can look in those folder as SimBA performs the transpose during import.

  3. Yes you can use THIS hyper parameter meta file to create models to start. However, you may have to play with it, in particular the under sampling ratios, to get a model that performs best in your setup.

wallawhitecoat commented 2 weeks ago

Can I still use the tail_rattle classifier if the tail_end is not labeled?

sronilsson commented 2 weeks ago

No sorry - the tail end classifier is legacy. We had some trouble with this - it was very difficult to get tail end tracked reliably even after extensive training and labelling - you may have more luck though. It was often confused with the bedding material and we saw a lot of ID swappes with the tail end of the two animals beeing confused with each other.

wallawhitecoat commented 2 weeks ago

Yes, I did experience that even after training close to 2000 frames. Could we use the tail_rattle classifier if the configuration looks like this? SLEAP settings

Also, the reason I ask is because in this youtube video I don't see a tail end but tail_rattle is still being quantified. https://www.youtube.com/watch?v=bqWteWIxzGM

sronilsson commented 2 weeks ago

Yes - that should work. But - just for full disclosure - I created the tail rattle classifier in Seattle around 2019. Everyone who worked on this back then, including maintaining the OSF repository, have other interests and jobs today. I don't remember exactly which features the tail end classifier excepts, and no-one is available for me to ask. If you hit errors, I can start to dig through for old data and figure it out though and we could solve it together.

A reason why it is not visualized is that although the classifications may be OKish: for that video we wanted to visualize the hull polygonal bounding box of the animals in white. We probably omitted the tail end to get that white polygon box thing around the animal to look neater.

wallawhitecoat commented 6 days ago

Ok, took your advice, back after re-training and labeling with this skeleton.

  1. So, I'm able to run the feature extraction, but is a file supposed to be inserted in the project_folder ->csv->features_extracted? Because I do not see anything. Feature Extraction image

  2. Additionally, I skipped the outlier correction, and the SLEAP data was successfully transposed into outlier_corrected_movement_location. I guess I mostly don't exactly understand what completing an outlier correction will do for my data, or what settings I should run it on.

  3. My last question resolves around how I should label. Do I just label all frames using the regular "label behavior" method? Or if I'm using the sav. classifiers should I use pseudo-labelling and/or advanced labeling?

wallawhitecoat commented 6 days ago

Additional note on Question 3 here, I noticed that if I attempt to Validate on a Single Video using the .sav files I get this error. Is that because I did not perform my feature extraction correctly? image image

When I attempt to run the machine model with these settings I get this error message, which also notes to the features_extracted directory. models image

sronilsson commented 5 days ago

Hi @wallawhitecoat -

1) Yes, it appears to go astray at the point of features extraction, with no files being created. From the screenshot, it looks like you have ticked the box to run your own feature extraction script like in this image below is this correct?

image

If you have ticked this box, could you share with me the .py file you have used in the Script path file browse box? From your screenshoot, it looks like the script is running appropriately, but the final file might be saved in the wrong location or not saved at all.

2) For outlier correction, I've tried to wrote some explanations HERE and HERE. In short, what it is trying to do, is to fix big pose-estimation inaccuracies based on user-defined (heuristic) rules. It may be that a body-part can't move N x the length of the animal in a single frame, or that a body-part can't be located distance N x the length of the animal from all the other body-parts of the animal. If it fails those rules, the body-parts are placed in their most recent reliable location. If you have very good pose-estimation tracking, you shouln't have to apply these rules though.

3) I've reached out for some help to answer this one I think @goodwinnastacia can help

wallawhitecoat commented 4 days ago

Thank you! I was able to perform a feature extraction just using the "extract features" button and not with the user-defined settings. I was then able to use the .sav files after deleting the following features, the .sav files only recognize 490 features, not 498 features. I also had to add a prefix for the pose estimation locations (ex. "track_1_Ear_left_1_x") image

However, after running the machine model and generating the GANTT graphs and video, I found that the classifiers were too sensitive for some, and not sensitive enough for others. I've tried multiple discrimination thresholds, so far 0.1 is too low and quantifies behavior when it doesn't happen, and 0.7 is too high and doesn't quantify anything. My pose estimation tracking is pretty good, so I'm not exactly sure what the issue is, but thank you for the resource I will follow up with them. Thank you again for helping me troubleshoot! labels v2001_final_image c image

sronilsson commented 3 days ago

Hi @wallawhitecoat - thanks for sharing - and thanks for figuring out the additional features that had to be removed - I wasn't sure if that would come up and it was in the back of my mind.

For the predictions - could we confirm that it is not the tracking that has been disrupted somehow along the process? Can you run the the classification visualization in the menu below and share the video?

image
goodwinnastacia commented 3 days ago

3. My last question resolves around how I should label. Do I just label all frames using the regular "label behavior" method? Or if I'm using the sav. classifiers should I use pseudo-labelling and/or advanced labeling?

Hi, I would always start by labeling a few full videos using the label behavior method just so that you have a good representation of what tail rattle ISN'T, since you have to teach the algorithm both what positive and negative frames look like. I would then create short clips with tail rattle present and label those with the label behavior method. Once you've got some videos labeled this way, you can take a look at using pseudolabeling.

For the tail rattle classifier we have posted, our tracking wasn't good enough for good behavioral classification because we only did tail base and tail end. Your tracking should be better!

wallawhitecoat commented 2 days ago

Of course, thank you for the advice, I performed the visualize quantifications (with the body-part visualization threshold set to 0.0), here are the results (no classifiers shown). It's a youtube link because the video was too large to post (even when zipped). Below you can see the settings I used and my rational.
https://www.youtube.com/watch?v=THQ2TnCWfsw

Decided to try these settings for running the machine labelling because they are detailed (in this paper) and I just set the tail_rattle classifier to 0.5 (even if it isn't good enough). Still not sure what to set my min. bout length to though. image image

As for labelling full videos, the reason I wanted to use existing classifiers is because they had been recently published in that paper, so I had thought users would be able to apply them to their own data (I had found this information from this quote in the paper "To perform supervised behavioral classification, users can download pre-made classifiers from our OSF repository, request classifiers from collaborators or create classifiers by annotating new videos in the scoring interface".

How would the existing classifiers/.sav files fit into my training If I manually annotate all the behaviors on my own? Would a wise course of action be to just get machine-results using the existing classifiers, and then do pseudo labeling?

Thank you again for your continued help on this :)

goodwinnastacia commented 2 days ago

So there are two potential ways to do this. 1) download our videos, track them in SLEAP using your 11 point model, and then append our ground truth annotations. I think this is advantageous because the extra tail point is going to help with your tail rattle classifications. 2) Download our targets inserted files off of OSF, add them to your project folder -> csvs -> targets inserted folder, and download the video info for those projects and add them to your own video info log. This will use our original DLC tracking for our videos with the less than ideal tail end tracking and could potentially make your classifications a bit worse at first. The advantages of this option are that you're diversifying your tracking so your models should ultimately be more robust, and you don't have to run more videos through pose estimation. Let me know which you'd like to move forward with and I can help you out with it.

It does look like there's a glitch on OSF right now and our files aren't showing up, but I should have that fixed by the end of the weekend.

wallawhitecoat commented 2 days ago

I think the second method would work out well, not having a perfect tail_rattle classifier is not a deal breaker for my team. If we could have attack and anogenital sniff work well that would be satisfactory. Also, no worries thank you for fixing!