RFC: Internal config object

oesteban commented 4 years ago

I would suggest moving away from the current model where a ton of parameters is passed as function arguments deep down in the workflow creation stream - which is already creating some frictions regarding whether parameters should not have default values.

The parser would create a singleton config object (which should not be accessible to the user, nor have a load public interface) where all the settings are stored.

Workflows have then access to these values in all contexts.

The config object has also a little API to dump the exact configuration of this run to a file (that could be placed in the output folder for provenance tracking, ping @satra).

Finally, we could make this config object have a push interface for the boilerplate to replace the visitor pattern we currently have and isolate boilerplate creation, which is currently too cohesive for a second-class feature.

Opinions?

cc/ @arokem @dPys @josephmje @mattcieslak because this could be a pattern to keep across NiPreps (and dmriprep in particular).

arokem commented 4 years ago

Thanks for looping me in here. In pyAFQ we've gone pretty explicitly on this by making the input to the analysis a config file (in toml format) that can configure different sections of the analysis. See here.

mattcieslak commented 4 years ago

+1 from me on this idea. The huge number of arguments to workflow-building functions is difficult to maintain.

oesteban commented 4 years ago

@arokem I will need a strong push back to change my mind about not exposing the config file to the user (other than storing one of those files in each subject's output folder). As we are talking in #1963, fMRIPrep should try not to offer many experimental degrees of freedom to the researcher - the lesser the better.

toml seems like a good option.

dPys commented 4 years ago

Yes, +2 on this for the sake of parsimony

josephmje commented 4 years ago

I'm in favour of this. Sometimes it can be a hassle trying to remember which arguments need to be passed on through multiple workflows. This sounds like a great solution.

effigies commented 4 years ago

I'm not sure that this is going to work. The big sticking point in my mind is the fact that workflows are spread across several projects, each of which is expected to be usable outside the context of fMRIPrep. For example, sMRIPrep is its own CLI, and SDCflows should be usable as a separate tool. Niworkflows has a wide variety of workflows, and a single globally-useful configuration object is going to have a lot of extraneous entries for most applications. If we go in the other direction, building per-workflow configurations, that's not too far different from function protocols.

As much of a pain as a bunch of individually-specified (sans defaults) parameters is, it makes clear what parameters are needed to instantiate a workflow by demanding that all subworkflow parameters be provided, or else it will error. And it does not leave room for a workflow to quietly start depending on a configuration option that is not right there in the function protocol.

The main additional problem that would need to be solved is maintaining parity between the CLI defaults and the configuration object defaults.

A sketch of a proposal might make things clearer.

For some historical context, we used to have a settings dictionary that would get passed around. This was discussed in #412 and changed in #469. That also addressed the problem of defining defaults in a single place (#442).

oesteban commented 4 years ago

@effigies I've been playing around this idea. It's still a draft, but it helps sketch how this would look https://github.com/poldracklab/fmriprep/compare/master...oesteban:maint/config-file

The big sticking point in my mind is the fact that workflows are spread across several projects, each of which is expected to be usable outside the context of fMRIPrep. For example, sMRIPrep is its own CLI, and SDCflows should be usable as a separate tool. Niworkflows has a wide variety of workflows, and a single globally-useful configuration object is going to have a lot of extraneous entries for most applications. If we go in the other direction, building per-workflow configurations, that's not too far different from function protocols.

The golden rule here is that the config module should not traverse projects. Only top-level, end-user workflows would be allowed to import the config.

As much of a pain as a bunch of individually-specified (sans defaults) parameters is, it makes clear what parameters are needed to instantiate a workflow by demanding that all subworkflow parameters be provided, or else it will error.

The existence of a config module does not prescribe the existence of defaults for all parameters (i.e., creation of workflows will still fail). With more clarity on what can be set default and whatnot, I don't think this particular makes any difference w.r.t. the current implementation.

The main additional problem that would need to be solved is maintaining parity between the CLI defaults and the configuration object defaults.

I'm still trying to minimize duplicity (because my proposal has some), but they seem to get along pretty well.

Additional improvements in my draft beyond the config module are:

Pushed most of the checkpoints on inputs earlier right after parsing and used the argument parser error for them - this make the CLI more consistent, and more testable.
Started playing around the new index-file feature of pyBIDS, although I'll need to think about locking in parallel execution against the same working directory.

nipreps / fmriprep

RFC: Internal config object #1964