poldracklab / fitlins

Fit Linear Models to BIDS Datasets
https://fitlins.readthedocs.io
Apache License 2.0
76 stars 31 forks source link

RFC: Default location for preproc directory #32

Closed effigies closed 6 years ago

effigies commented 6 years ago

Given the BIDS App CLI:

$ app <bids> <out> <level> [<option> ...]

We should settle on a default location to look for the preprocessed dataset. In the case of:

$ fitlins /data/bids/ds000XYZ /data/bids/ds000xyz/derivatives <level> [...]

The reasonable default is /data/bids/ds000xyz/derivatives/fmriprep, but this is derivatives/fmriprep, relative to the BIDS root, and fmriprep relative to the output directory. When we have:

$ fitlins /data/bids/ds000XYZ /data/out/ds000xyz/derivatives <level> [...]

The options become:

Up until now I have assumed the latter, reasoning that derivatives will tend to be kept together, even when not falling under the BIDS root directory. However, in #28, the proposed presumption is that preprocessed derivatives are special, and will be kept in the BIDS root directory even when the Fitlins derivatives to be written will go elsewhere.

It would be nice to get some intuitions from others. (I'll go ahead and tag @tyarkoni, @satra, @mgxd, @jordandekraker.)

To preempt what seems like an inevitable suggestion, I'm hesitant to make this too smart, where we first check for one and then the other, as adding the higher priority one to a setup that was using the lower-priority would silently change the behavior (assuming that the preprocessed files are different). A case to consider here is if the higher-priority one was misspelled (e.g. frmiprep), and we will silently fall back to the lower-priority one, correctly spelled directory. But maybe that's okay?

Finally, just as a UX note, the current behavior allows relative paths as well as absolute. So the actual default is -p fmriprep, which is resolved relative to out_dir. The idea here is to allow alternatives like -p spm or -p feat without requiring users to add a full path. So making the right default will affect ease-of-use in that case.

adelavega commented 6 years ago

Question: What happens when you run fitlins several times on different models? Is it your expectation that the <out> dir changes? Or does it stay the same, and different outputs can coexist in the same folder?

effigies commented 6 years ago

I would generally expect each model to produce unique outputs, which could sit side-by-side, if desired. If we need to adjust the output naming structure, that's totally fine. It's pretty bare-bones at the moment.

In any event, we should also add a --derivative-label parameter. (See discussion.)

effigies commented 6 years ago

In the absence of any outside input, what would you say to checking <bids_dir>/derivatives/<preproc> and <out_dir>/<preproc>? And what priority do you think is most appropriate?

satra commented 6 years ago

@effigies and @adelavega - i see the following scenarios for derivatives:

  1. different versions of a given bids-derivative-generating-app /derivatives/app1<-version>/

so app1 should be able to append -version if requested by the runner. otherwise it could potentially overwrite the existing output or error out depending on the app.

  1. different parameterizations of a specific version of a given bids-derivative-generating-app

/derivatives/app1<-version>/<parameter-index/>

whether we say that each derivative should have this hierarchical structure or a flat structure (each app version parameter combo is a single root directory) is what makes the space of derivatives really large.

effigies commented 6 years ago

@satra The consensus that came out of the BIDS sprint last summer was for BIDS apps to support a generic, unconstrained label, separated from the pipeline name with an underscore. It is the responsibility of each app to provide that option, and I don't think any have yet. (I've just merged #37, but haven't released, yet.)