Closed SophieHerbst closed 5 months ago
We could probably at least get rid of -icafit_epo.fif
. We can recreate that on the fly for the two steps that use it I think instead of saving it during the first step. It be slower so we have to decide if we want that tradeoff to save on some hard drive space (I think probably it's worth it).
For the others it's not as easy. The idea behind MNE-BIDS-Pipeline is that each step has a defined step of M inputs, and N outputs. If you want to know if a step needs to be rerun, you check for existence and validity (age/hash) of both M and N. If something is wrong you recompute. So given files get created _epo.fif
-> _proc-ica_epo.fif
-> _proc-clean_epo.fif
in three separate steps (epoching, apply ica, and peak-to-peak, respectively) even though the first two might be considered transient / unnecessary would mean we'd have to change how caching works... and I'm not sure how easy that would be.
Also in some cases we need those intermediate files -- for example we have a config param that allows you to decide whether to use _epo.fif
or _proc-clean_epo.fif
in decoding. So keeping intermediate files around can help.
Finally one important reason to keep intermediate files is it's sometimes invaluable for debugging problems. Let's say you get a _proc-clean.fif
that looks bad for some reason. Being able to work backward through _proc-ica_epo.fif
then to _epo.fif
you can identify the step where things went wrong. The report in general is good for this but it's never as comprehensive/complete as being able to look at the original data.
We could offer to select which output files to automatically remove once the pipeline run was completed. Like, a new last step named "cleanup" or so
But if we do that and the user does mne_bids_pipeline config.py
, it will recompute and recreate all of those files. I don't think that's good.
Some sort of final archive or cleanup function once the study is completed might be a good idea to save storage space?
To me if you're 100% done and want to save space this shouldn't be MNE-BIDS-Pipeline's job really -- it should be easy enough to do with a custom shell (or Python) script at the end.
@SophieHerbst Do you think this can be closed for the time being?
yes @hoechenberger!
I just realized that my derivatives folders are 22GB per participant. My epochs are very long (10s), needed for this particular study. Still, do we need to save all the intermediate steps given the automatic caching where things are rerun anyways if parameters change? For instance, I have 4 epoch files per participant: