sgoldenlab / simba

SimBA (Simple Behavioral Analysis), a pipeline and GUI for developing supervised behavioral classifiers
https://simba-uw-tf-dev.readthedocs.io/
GNU General Public License v3.0
279 stars 138 forks source link

Kleinberg behavior classification doesn't analyze the whole csv #170

Open carlitomu opened 2 years ago

carlitomu commented 2 years ago

Hi, I'm trying to smooth my behavioral data running the Kleinberg behavior classification, but it doesn't work properly.

The video has approximately 120000 frames, but the Kleinberg behavior classification stops to recognize the classified bouts at the 44896 frame.

I then removed the firts 44896 rows in the csv machine results70 file and again the Kleinberg behavior classification failed to analyze the whole csv (this time it stops at 70287.

Any suggestions to get a complete analysis?

Thanks, Carlo.

sgoldenlab commented 2 years ago

Hey @carlitomu - thanks for reporting and let me see if I can recreate. Does it come with any error msg, in the terminal window or the main simba window?

carlitomu commented 2 years ago

No, there are no error messages in either the terminal window or the main simba window. Thanks!

sgoldenlab commented 2 years ago

@carlitomu Would you mind sending me the CSV (or parquet) file, located in project_folder/csv/machine_results to see if I can recreate, I can't immediately recreate it with my own files and it would be quicker if I had yours. It would be the 120k row file that you attempt to put through the Kleinberg method. Can you perhaps share it through a gdrive link for goldenneurolab@gmail.com ?

carlitomu commented 2 years ago

Ok, I've just shared it.

sgoldenlab commented 2 years ago

Thanks got it, I will let you know.

sgoldenlab commented 2 years ago

One more question so I understand the issue properly: the csv file is 120611 rows to begin with. When pushed through Kleinberg, the output csv still has 120611 rows, but the late behavioral bouts are removed and have been changed from 1 to 0?

carlitomu commented 2 years ago

Yes, you understand perfectly.

sgoldenlab commented 2 years ago

Yeah I can replicate what you are seeing. My instinct is that there are 20k-ish frames between frame 40k and frame 60k, without any behavior happening, which puts the markov chain in a very strong off state which it is difficult to re-engage from, but playing with hyper-parameters does not seem to do much... I am not sure if that is the cause, but could be tested by removing those 20k frames and seeing if it makes any difference.

carlitomu commented 2 years ago

Thanks for your suggestions, but it doesn't work. I removed the frames between 40k and 60k without interactions: but the output file remains the same as before.

I also tried to remove the first 40k frames and so I get a different output file, with other behaviors, but again an incomplete analysis.

sronilsson commented 2 years ago

Thanks @carlitomu (still sgoldenlab, my handle is switching across computers). I will need to check this and try to replicate with other longer videos when I have time at the end of the week and get back to you. The data reads in exactly the way it should and none of the later frames are dropped which also was a though..

You may have done it already, but when you do drop the frames between 40k and 60k without interactions, make sure you also update the index column so it reads continuously from `0 to the final frame number and there are no missing integer values (i.e., you should still rows have row indexes between 40-60k, it is the later values that will disappear).

sgoldenlab commented 2 years ago

Hi @carlitomu - I've tested a few things and can't get my head around it - other files of equal size or longer (but typically more frequent behaviors) process fine, and are no issues with the data types. At the moment, I am at loss and take that it is expected behavior considering how your classified behaviors are expressed throughout the session.