start sequence train/inference

clewis7 commented 1 year ago

everything seems to working...yay!

remaining issues:

[ ] storing the features from feature extraction in the dataframe become really slow to re-load the dataframe to memory
[ ] feature extraction and sequence inference is done on series not on the entire dataframe, in the series extensions there is no access to the dataframe...need to be able to save the dataframe to disk after a series extension is run

still to-do:

[ ] implement feature extraction and sequence inference into one single dataframe series extension (df.iloc[ix].behavior.infer(mode='fast'))

kushalkolar commented 1 year ago

@clewis7

storing the features from feature extraction in the dataframe become really slow to re-load the dataframe to memory

What about storing this data in a dir, one for each session? Can we assume session names are unique? This becomes like mesmerize's "batch dir" structure.

I would propose:

one dir for each session
hdf5 file to store the features for every trial in that session

If the only thing that really has to be stored is the features, then you can just have a single dir where each session has an hdf5 file.

feature extraction and sequence inference is done on series not on the entire dataframe, in the series extensions there is no access to the dataframe...need to be able to save the dataframe to disk after a series extension is run

If you implement the above solution where hdf5 files store the extracted features for each session, then you don't need to store any of this in the dataframe :D . Might as well also store the ethograms in another hdf5, one hdf5 file for each session, all trials ethograms in one sessional hdf5 file.

The alternative to all this is to use Polars instead of pandas, it does support dataframe and series extensions https://pola-rs.github.io/polars/py-polars/html/reference/api.html

But polars could take a while to set up, I've never used it and we already have experience with the above proposed solutions from mesmerize.

clewis7 commented 1 year ago

[ ] storing things in hdf5 files, one per session (spatial and flow features, ethograms)

clewis7 commented 1 year ago

something is still going on with the model checkpoints I made with DEG for slow/medium/fast

if I use the checkpoints from one of the cross-validation runs that I did it works beautifully...not sure what is going on will have to investigate tmw but for now the outputs are being saved to an outputs file per session

clewis7 commented 1 year ago

actually, maybe it is doing okay and my thresholds are just not right...

hantman-lab / animal-soup

start sequence train/inference #34