sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
289 stars 141 forks source link

Customize outlier correction #281

Open DorianBattivelli opened 1 year ago

DorianBattivelli commented 1 year ago

Hello,

(I'm using SimBA 1.71.6, on windows 10 with ma-DLC data).

I'm facing a case where central bp is reliably tracked but not the tail base, which is hidden half of the time. Consequently, running outlier correction modify dramatically the tracking outcome:

Without outlier correction: C57-B7-Urine-MS_US

Here the problem is that I have a lot of straight lines, that certainly influence final total locomotion (can you please confirm this?)

So I tried to apply outlier correction, but criteria are too stringent:

With outlier correction (location = 2 ; movement = 2): C57-B7-Urine-US_MS

I suspect this could be due the missing label of tail base on a lot of frames. Could this be true? How to deal with this? Alternatively I could try with another pair of body parts....

What seems to be the best strategy here according to you?

FYI: In both scenari I used interpolation "nearest bp" and smoothing Savitsky = 200 I precise that my videos are either 5 or 10fps.

Thank you for the support, Best,

sronilsson commented 1 year ago

Hi @DorianBattivelli! Yes, as you say: the straight lines are indicative of the central bp disappearing from the tracking data. Using interpolation nearest, it reappears some distance away from where it was last reliably seen, and a straight line is drawn between those two locations. The outlier correction, as you say, is too stringent (possibly) primarily due to the missing tail base tracking causing SimBA to remove a lot of movements that are actually true.

There is a few ways to fix:

(i) Rather than using "nearest" interpolation you can use quadratic or linear interpolation. As in the below image, you see you won't get those big jumps, but smoother lines filling in the missing points.

image

(ii) If you want to try custom outlier correction, you can, but I have not been able to put these functions in the GUI so all I can offer are jupyter notebbook examples.

This notebook allows you to perform outlier correction individually: first it runs movement outlier correction then location outlier correction. You just have to comment out the outlier correction that you don't want to perform in this cell:

image

If you want to perform more advanced outlier correction, i.e., apply different rules to different body-part or animals, and again have option to skip different outlier correction steps, there is THIS notebook example.

I wrote these notebooks recently in response to THIS issue, you can check it to get some background.

sronilsson commented 1 year ago

@DorianBattivelli did any of this help the straight lines?

DorianBattivelli commented 1 year ago

Sorry for the delay, quadratic interpolation did not solve the issue, and I did not try yet to use Jupyter, cause I'm not familiar with it. Rather, I tried different values for outlier correction from GUI interface, and I think to have reached quiet satisfying results. If I have a chance to try Jupyter customization, I'll let you know how it goes,

Thank you, Best,

sronilsson commented 1 year ago

👍🏻 Sound good! An alternative is to fix issue at source with getting tracking model to have fewer missing or incorrect values, but as you say it may be overkill here

DorianBattivelli commented 1 year ago

Another question: where can I find the different parametres I used to process data (smothing, interpolation, outlier correction criteria etc.). I often do many trials to tune at best these features, and sometimes I get confused about the values I used to generates my data. Can I find these information in some files of the simba project?

sronilsson commented 1 year ago

That's is a good point.. it doesn't store a log at the moment of the different methods and parameters that was executed at which times, so you'd have to keep tabs some other way. I will insert a session log that keeps track of it.

DorianBattivelli commented 1 year ago

Thanks it would be very helpful!

Another point I still did not manage to solve: how to deal with extra videos? When I add extra videos / h5 files to an already existing project, unfortunalty all processing (smoothing, interpolation and outlier correction) applies to all the videos (including the ones already analysed) which makes the process very long (especially when it's about adding only a couple of new videos for a project containing 30 videos).

I try to remove temporarilly the video and corresponding h5 files of the already analyzed items from their folder, but then SimBA returns error, certainly cause it cannot find the items listed in the info_video_csv file.

Thank you for the support, Best,

sronilsson commented 1 year ago

@DorianBattivelli - sorry I missed this last msg. When clicking RUN OUTLIER CORRECTION, the code looks first inside the project_folder/csv/input_csv directory, and performs movement outliers on all files in that directory then and stores the results in the project_folder/csv/outlier_corrected_movement directory. Then it looks inside the project_folder/csv/outlier_corrected_movement folder and performs location outlier corrections on those files, and then stores the results inside the project_folder/csv/outlier_corrected_movement_location directory. If say you move files out of the project_folder/csv/input_csv, but not from the project_folder/csv/outlier_corrected_movement directory, you could get the errors you are seeing.

There is an archive function described HERE have you tried it?

sronilsson commented 1 year ago

@DorianBattivelli - also, I added a logger that is added to the projects and stores the information of the methods you run and at which times. If you update SimBA, and perform any function (say outlier correction), there is a text file at project_folder/logs/project_log.log that can look a bit like the attached file.

It can tell when you performed outlier corrections and which criterion and body-parts you used. E.g., these lines tells me the latest outlier correction critera and body-parts I used:

2023-09-01T15:46:56Z|OutlierCorrecterMovement||CLASS_INIT||Criterion: 1.0, Body-parts {'Animal_1': {'bp_1': 'Ear_left_1', 'bp_2': 'Ear_right_1'}, 'Animal_2': {'bp_1': 'Ear_left_2', 'bp_2': 'Right_ear_2'}}
2023-09-01T15:46:57Z|OutlierCorrecterMovement.stdout_success||complete||Log for corrected "movement outliers" saved in project_folder/logs
2023-09-01T15:46:57Z|OutlierCorrecterLocation||CLASS_INIT||Criterion: 2.0, Body-parts {'Animal_1': {'bp_1': 'Ear_left_1', 'bp_2': 'Ear_right_1'}, 'Animal_2': {'bp_1': 'Ear_left_2', 'bp_2': 'Right_ear_2'}}
2023-09-01T15:47:23Z|OutlierCorrecterLocation.stdout_success||complete||Log for corrected "location outliers" saved in project_folder/logs
2023-09-01T15:47:23Z|SimbaProjectPopUp.stdout_success||complete||Outlier corrected files located in "project_folder/csv/outlier_corrected_movement_location" directory

Please let me know if you find it useful or if something is missing!

project_log.log

DorianBattivelli commented 1 year ago

@DorianBattivelli - sorry I missed this last msg. When clicking RUN OUTLIER CORRECTION, the code looks first inside the project_folder/csv/input_csv directory, and performs movement outliers on all files in that directory then and stores the results in the project_folder/csv/outlier_corrected_movement directory. Then it looks inside the project_folder/csv/outlier_corrected_movement folder and performs location outlier corrections on those files, and then stores the results inside the project_folder/csv/outlier_corrected_movement_location directory. If say you move files out of the project_folder/csv/input_csv, but not from the project_folder/csv/outlier_corrected_movement directory, you could get the errors you are seeing.

There is an archive function described HERE have you tried it?

Proceeding as explained here solved the issue, thank you! Thanks also for the upgrade with logger :)