hantman-lab / animal-soup

Hantman Lab automated behavioral classification system.
GNU General Public License v3.0
4 stars 0 forks source link

start sequence train/inference #34

Closed clewis7 closed 1 year ago

clewis7 commented 1 year ago

everything seems to working...yay!

remaining issues:

still to-do:

kushalkolar commented 1 year ago

@clewis7

storing the features from feature extraction in the dataframe become really slow to re-load the dataframe to memory

What about storing this data in a dir, one for each session? Can we assume session names are unique? This becomes like mesmerize's "batch dir" structure.

I would propose:

If the only thing that really has to be stored is the features, then you can just have a single dir where each session has an hdf5 file.

feature extraction and sequence inference is done on series not on the entire dataframe, in the series extensions there is no access to the dataframe...need to be able to save the dataframe to disk after a series extension is run

If you implement the above solution where hdf5 files store the extracted features for each session, then you don't need to store any of this in the dataframe :D . Might as well also store the ethograms in another hdf5, one hdf5 file for each session, all trials ethograms in one sessional hdf5 file.

The alternative to all this is to use Polars instead of pandas, it does support dataframe and series extensions https://pola-rs.github.io/polars/py-polars/html/reference/api.html

But polars could take a while to set up, I've never used it and we already have experience with the above proposed solutions from mesmerize.

clewis7 commented 1 year ago
clewis7 commented 1 year ago

something is still going on with the model checkpoints I made with DEG for slow/medium/fast

if I use the checkpoints from one of the cross-validation runs that I did it works beautifully...not sure what is going on will have to investigate tmw but for now the outputs are being saved to an outputs file per session

clewis7 commented 1 year ago

actually, maybe it is doing okay and my thresholds are just not right...