Closed clewis7 closed 1 year ago
@clewis7
storing the features from feature extraction in the dataframe become really slow to re-load the dataframe to memory
What about storing this data in a dir, one for each session? Can we assume session names are unique? This becomes like mesmerize's "batch dir" structure.
I would propose:
If the only thing that really has to be stored is the features, then you can just have a single dir where each session has an hdf5 file.
feature extraction and sequence inference is done on series not on the entire dataframe, in the series extensions there is no access to the dataframe...need to be able to save the dataframe to disk after a series extension is run
If you implement the above solution where hdf5 files store the extracted features for each session, then you don't need to store any of this in the dataframe :D . Might as well also store the ethograms in another hdf5, one hdf5 file for each session, all trials ethograms in one sessional hdf5 file.
The alternative to all this is to use Polars instead of pandas, it does support dataframe and series extensions https://pola-rs.github.io/polars/py-polars/html/reference/api.html
But polars could take a while to set up, I've never used it and we already have experience with the above proposed solutions from mesmerize.
something is still going on with the model checkpoints I made with DEG for slow/medium/fast
if I use the checkpoints from one of the cross-validation runs that I did it works beautifully...not sure what is going on will have to investigate tmw but for now the outputs are being saved to an outputs file per session
actually, maybe it is doing okay and my thresholds are just not right...
everything seems to working...yay!
remaining issues:
still to-do:
df.iloc[ix].behavior.infer(mode='fast')
)