Open jsheunis opened 4 years ago
@jsheunis any chance I can get write access? I just realised there's a typo "pipelines take on" -> "pipelines taking on", and I didn't mention who our panelists are!
It's Karolina Fing, Oscar Esteban, Satra Ghosh and Erin Dickie, right?
Hi @kfinc @oesteban @satra @edickie,
In a last minute panic I have crated an abstract for our BrainHack panel session, registered as one of the "Emergent" sessions. The abstract is just OK, and worse I realised I failed to name check each of you!
Please read and offer edits (as additional comments here) on this abstract and hopefully @jsheunis can ensure that the final edits are incorporated.
Some thoughts added in bold.
Open pipelines collect the best expertise, algorithms and planning into community standards, allowing wide access to top-of-the-range analysis pipelines. While some frameworks exist explicitly to allow users to build pipelines with a diverse set of tools (e.g. Nipype) others comprise specific pipelines offered as a particular best practice solution (e.g. fmriprep as an example of the larger set of evolving nipreps, HCP-Pipelines). Is there a danger of these prepared pipelines take on the hallowed role that individual software tools have previously held, such that their use becomes expected and their non-use needs to be justified? What limits do these pose on scientific questions? How do we approach continuous evaluation and validation and dissemination of such workflows? Do we have built-in procedures in these community standards and development procedures (e.g. niflows) that allow for critical evaluation?
nipreps: https://www.nipreps.org/ niflows: https://github.com/niflows
Sorry, this notification was filtered by gmail to some unchecked folder - I just recovered it by chance. Therefore, this I am writing right now is not very well thought-through. I'll come back later today. Just to trigger some brainstorm, I'll go ahead and post some ideas.
Why is the community steering towards these community pipelines? I think some reflection and review of the literature would be interesting. Ask ourselves why SPM, FSL, FS, AFNI, etc. emerged in the first place and what were the needs they were covering. And why we needed to go beyond that. Some ideas about it I have:
When the community pattern makes sense? IMHO in largely integration projects where software is more of a research instrument (or a commodity) rather than the substrate of analysis. This is related to @satra's question about what limits do these pose on scientific questions - if we are talking about infrastructure, these efforts might expand (not limit) the potential scientific questions (that is the point of NiPreps www.nipreps.org, which I think is a good starting point).
Risks? I found kind of funny one of the sentences about this emergent session I read at some point: is there a risk that one pipeline becomes the rule? Because the community is behind the pipeline, and because I see this pipelines positive in early workflow stages (not for analysis), I think there is not risk at all of this happening. If fMRIPrep, with all fMRI researchers taking care of it, reaches a point where it really handles any data with the best combination of steps in the best order - is there a reason not to call it the standard? Obviously this doesn't work for cutting-edge methods, and at some point the pipeline will reach some end-of-life. But in the meantime, I can only see positive effects to that standardization (in the domain of software infrastructure). There is this concept of Software as a Medical Device (SaMD) that I find interesting here.
Life cycle - when these SaMDs are valid and recommendable?
Validation - as @satra mentioned, standardization is the first necessary step in the search for solid test-oracles that will allow us to validate and quality-assess the SaMD.
tl;dr followup to some pieces.
i generated this for a different reason, but i think is applicable here:
this shows amygdala volumes computed by freesurfer and fsl on the same data (about 1800 cases). the intent of this figure is to show that consistency between our tools is missing even for very basic data elements. now compound these differences, say in a more complex workflow that uses said amygdala ROI in an autism study to look at genetics and fMRI integration. we are going to have a larger hyper-parameter space and evaluating implications of these different tools gets exponentially harder. therefore as a field, we have to move closer and closer towards SaMD
(as @oesteban nicely puts it) and ask how do we quantify and establish the precision and accuracy of these devices, and validate them over time.
we don't always know what the "correct" answer is, but when we do (and as a community we agree we do) it is imperative that we quickly move to establish software that performs within those verifiable limits. we should be able to say, here is a validated workflow that measures amygdala volume within 5% of tolerance as measured via X (whatever we decide is the gold/silver standard).
@jsheunis any chance I can get write access? I just realised there's a typo "pipelines take on" -> "pipelines taking on", and I didn't mention who our panelists are!
It's Karolina Fing, Oscar Esteban, Satra Ghosh and Erin Dickie, right?
Fixed the typo and added the discussion participants.
@jsheunis could we change the useful links to:
nipype: https://nipype.readthedocs.io/en/latest/
nipreps: https://www.nipreps.org/
niflows: https://github.com/niflows/
Some interesting points that we can include in the discussion & I'm happy to add some thoughts:
Referring to Botvinik-Nezer et al. 2020:
Other points:
Open, community-driven, software pipelines: a retrospective and future outlook
By: Thomas Nichols, University of Oxford, Big Data Institute Karolina Finc Oscar Esteban Satra Gosh Erin Dickie
Abstract
Open pipelines collect the best expertise, algorithms and planning into community standards, allowing wide access to top-of-the-range analysis pipelines. While some frameworks exist explicitly to allow users to build pipelines with a diverse set of tools (e.g. NiPype) others comprise specific pipelines offered as a particular best practice solution (e.g. fmriprep). Is there a danger of these prepared pipelines taking on the hallowed role that individual software tools have previously held, such that their use becomes expected and their non-use needs to be justified? How do we approach continuous evaluation of such workflows? Do we have built-in procedures in these community standards that allow for critical evaluation?
Useful Links
Public Mattermost channel for discussions prior to, during and after the session. nipype nipreps niflows
Tagging @nicholst @kfinc @oesteban @satra @edickie