Workflow interface: input / output / reports

nipreps / fmriprep

fMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse fMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.

https://fmriprep.org

Apache License 2.0

629 stars 291 forks source link

Workflow interface: input / output / reports #456

Closed effigies closed 7 years ago

effigies commented 7 years ago

By giving each workflow an inputnode, outputnode and reportnode, we can reduce the clutter within each workflow where sinking/derivatives are put on visual parity with processing nodes.

Reportlets can then be passed to an interface/workflow that constructs the full report, isolated from the processing/reportlet construction.

Similarly, it would be nice (though possibly not feasible) to reduce output sinks to at most one per sub-workflow, and ideally fewer, grabbing outputs from sub_wf.outputnode.

chrisgorgo commented 7 years ago

Could you explain a bit more what role the reportnode would play?

Please keep in mind that with the current architecture we need one datasink per reportlet, because we want as many reportlets in a partial report (when something fails). If we connect multiple reportles to one data sink and one of the interfaces creating reportlets fails none of them will be saved and all of them will be missin in the final report.

effigies commented 7 years ago

I might be missing something. Why is a datasink needed, as opposed to an interface that grabs the reportlets from their original locations in the working directory?

But to be (hopefully) a little clearer, the reportnode would collect and give named ports for each of the reportlets that a workflow generates. If the reportlet is undefined, we can skip over it, as we do now. The only thing that needs a sink here is the final report.

If this sounds like too much of a pain, NBD. I just find the reportlet derivatives to be very cluttering when I'm trying to understand a new workflow (or part of a workflow).

chrisgorgo commented 7 years ago

On Wed, Apr 19, 2017 at 11:13 AM, Chris Markiewicz <notifications@github.com

wrote:

I might be missing something. Why is a datasink needed, as opposed to an interface that grabs the reportlets from their original locations in the working directory?

We could do that, but this means we will heavily depend on the structure of the working directory not changing over time.

But to be (hopefully) a little clearer, the reportnode would collect and give named ports for each of the reportlets that a workflow generates. If the reportlet is undefined, we can skip over it, as we do now. The only thing that needs a sink here is the final report.

Ok - get it.

If this sounds like too much of a pain, NBD. I just find the reportlet derivatives to be very cluttering when I'm trying to understand a new workflow (or part of a workflow).

You mean the reportlet datasinks?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/poldracklab/fmriprep/issues/456#issuecomment-295374016, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp7OC0nktFL7mroRaqfJYP8cbplzCks5rxk7MgaJpZM4NB0Eo .

effigies commented 7 years ago

We could do that, but this means we will heavily depend on the structure of the working directory not changing over time.

Will it? Since the full reportlet path is what's passed to the report-constructor, it seems like this should be fairly robust.

You mean the reportlet datasinks?

Yeah.

chrisgorgo commented 7 years ago

If I understand this idea correctly it will suffer from the same problem as a single datasink.

Basically if you connect reportles from many nodes to a single report-constructor node and one of the reportlet building node fails the report-constructor will never be run (since not all of its dependencies run successfully) and the report will not be generated at all.

effigies commented 7 years ago

Yes, that's true. Hmm. Guess I didn't think this through after all.

chrisgorgo commented 7 years ago

I do agree that having reportlet datasinks inside workflows is a bit confusing and not very neat.

effigies commented 7 years ago

There might be a hacky way of doing this where, instead of a report-generator interface, we could have a workflow that's just a bunch of independent datasinks, so at least we could manage some separation. But that might be more work than it's worth...

effigies commented 7 years ago

So I've done a little fiddling in the anatomical workflows to see how much of a pain it would be to separate derivatives/reports from processing logic.

workflows-2

A few thoughts:

A separate reportnode is no more useful than an inputnode on a report workflow, so scratch that idea entirely.
Passing derivatives from the outputnode is cleaner, when available. It also gives an opportunity to keep names consistent; e.g. every time you see the binary mask in T1 space, it has the same name.
Perhaps it makes sense to require that all derivatives be passed to an outputnode, even if we're not planning on hooking them up to other workflows?
A single derivative node (e.g. for GIFTI surfaces) is probably not worth separating out of a workflow.
This is only a partial separation of concerns; the derivatives and report sinks are still in the same metaworkflow as the processing nodes.

Finally, I've only done this on the anatomical workflow. If it seems worthwhile, I can do it for the rest (though not until after #458).

effigies commented 7 years ago

Rebased this to use the new visualizations:

workflows-2

Compare:

chrisgorgo commented 7 years ago

Indeed this does clean up things substantially!

oesteban commented 7 years ago

I this issue still an issue? @effigies