themattinthehatt / behavenet

Toolbox for analyzing behavioral videos and neural activity
https://behavenet.readthedocs.io/
MIT License
57 stars 15 forks source link

fitting a multisession fails if data and save directories are different #10

Closed johnmbarrett closed 4 years ago

johnmbarrett commented 4 years ago

Steps to reproduce:

1). Run

from behavenet import setup
setup()

setting the base data and base results directories to different folders. 2). Fit an AE model to multiple experimental sessions by setting at least one of expt, animal, or session to "all" in the data config. Example data and configs are attached, rename john.zip to john.7z and extract the data to your data folder, then extract the configs to your .behavenet folder and run:

python behavenet/fitting/ae_grid_search.py --data_config ~/.behavenet/john_seed_handling_params.json --model_config ~/.behavenet/ae_model.json --training_config ~/.behavenet/ae_training.json --compute_config ~/.behavenet/ae_compute.json

Expected behaviour: behavenet fits an AE model to all your data.

Actual behaviour: behavenet fails with the following error:

using data from following sessions:
Z:\LACIE\DATA\John\Videos\seed handling\behavenet\results\john\seed_handling\ae\conv
constructing data generator...Caught exception in worker thread Unable to open file (unable to open file: name = 'Z:\LACIE\DATA\John\Videos\seed handling\behavenet\data\john\seed_handling\ae\conv\data.hdf5', errno = 2, error message = 'No such file or directory', flags = 40, o_flags = 0)
Traceback (most recent call last):
  File "C:\Users\LSPS2\virtualenvs\behavenet\lib\site-packages\test_tube\argparse_hopt.py", line 37, in optimize_parallel_gpu_private
    results = train_function(trial_params, gpu_id_set)
  File "C:\Users\LSPS2\Documents\Python\behavenet\behavenet\fitting\ae_grid_search.py", line 44, in main
    data_generator = build_data_generator(hparams, sess_ids)
  File "c:\users\lsps2\documents\python\behavenet\behavenet\fitting\utils.py", line 805, in build_data_generator
    train_frac=hparams['train_frac'])
  File "c:\users\lsps2\documents\python\behavenet\behavenet\data\data_generator.py", line 501, in __init__
    device=device, as_numpy=self.as_numpy))
  File "c:\users\lsps2\documents\python\behavenet\behavenet\data\data_generator.py", line 195, in __init__
    with h5py.File(data_file, 'r', libver='latest', swmr=True) as f:
  File "C:\Users\LSPS2\virtualenvs\behavenet\lib\site-packages\h5py\_hl\files.py", line 394, in __init__
    swmr=swmr)
  File "C:\Users\LSPS2\virtualenvs\behavenet\lib\site-packages\h5py\_hl\files.py", line 170, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'Z:\LACIE\DATA\John\Videos\seed handling\behavenet\data\john\seed_handling\ae\conv\data.hdf5', errno = 2, error message = 'No such file or directory', flags = 40, o_flags = 0)

Additional comments: when fitting a multisession, behavenet seems to look in the results directory for the data (but not when fitting a single session, that works fine with different data and save directories). I forget where exactly in the code this happens as it was a few days ago I ran into this. As a workaround, it's possible to fit a multisession by setting the data and results directories to be the same.

themattinthehatt commented 4 years ago

@johnmbarrett there is now a (somewhat clunky) fix for this. I added another field in the data json named all_source which should be set to either "data" or "save", and defines where the code searches for sessions to construct multisessions. If all_source="data" then the behavior is as you hoped for: all available sessions in the data directory are found and used. In order to retain compatibility with older analysis code the default is set to all_source="save".

The updates are in the develop branch, hopefully we'll push everything out to master in the not-too-distant future.