ohbm / osr2020

Website for the Open Science Room at the OHBM 2020 meeting
https://ohbm.github.io/osr2020
Other
18 stars 6 forks source link

Past, Present and Future of Open Science (Emergent session): Open, community-driven, software pipelines: a retrospective and future outlook #68

Open jsheunis opened 4 years ago

jsheunis commented 4 years ago

Open, community-driven, software pipelines: a retrospective and future outlook

By: Thomas Nichols, University of Oxford, Big Data Institute Karolina Finc Oscar Esteban Satra Gosh Erin Dickie

Abstract

Open pipelines collect the best expertise, algorithms and planning into community standards, allowing wide access to top-of-the-range analysis pipelines. While some frameworks exist explicitly to allow users to build pipelines with a diverse set of tools (e.g. NiPype) others comprise specific pipelines offered as a particular best practice solution (e.g. fmriprep). Is there a danger of these prepared pipelines taking on the hallowed role that individual software tools have previously held, such that their use becomes expected and their non-use needs to be justified? How do we approach continuous evaluation of such workflows? Do we have built-in procedures in these community standards that allow for critical evaluation?

Useful Links

Public Mattermost channel for discussions prior to, during and after the session. nipype nipreps niflows

Tagging @nicholst @kfinc @oesteban @satra @edickie

nicholst commented 4 years ago

@jsheunis any chance I can get write access? I just realised there's a typo "pipelines take on" -> "pipelines taking on", and I didn't mention who our panelists are!

It's Karolina Fing, Oscar Esteban, Satra Ghosh and Erin Dickie, right?

nicholst commented 4 years ago

Hi @kfinc @oesteban @satra @edickie,

In a last minute panic I have crated an abstract for our BrainHack panel session, registered as one of the "Emergent" sessions. The abstract is just OK, and worse I realised I failed to name check each of you!

Please read and offer edits (as additional comments here) on this abstract and hopefully @jsheunis can ensure that the final edits are incorporated.

satra commented 4 years ago

Some thoughts added in bold.

Open pipelines collect the best expertise, algorithms and planning into community standards, allowing wide access to top-of-the-range analysis pipelines. While some frameworks exist explicitly to allow users to build pipelines with a diverse set of tools (e.g. Nipype) others comprise specific pipelines offered as a particular best practice solution (e.g. fmriprep as an example of the larger set of evolving nipreps, HCP-Pipelines). Is there a danger of these prepared pipelines take on the hallowed role that individual software tools have previously held, such that their use becomes expected and their non-use needs to be justified? What limits do these pose on scientific questions? How do we approach continuous evaluation and validation and dissemination of such workflows? Do we have built-in procedures in these community standards and development procedures (e.g. niflows) that allow for critical evaluation?

nipreps: https://www.nipreps.org/ niflows: https://github.com/niflows

oesteban commented 4 years ago

Sorry, this notification was filtered by gmail to some unchecked folder - I just recovered it by chance. Therefore, this I am writing right now is not very well thought-through. I'll come back later today. Just to trigger some brainstorm, I'll go ahead and post some ideas.

satra commented 4 years ago

tl;dr followup to some pieces.

i generated this for a different reason, but i think is applicable here:

image

this shows amygdala volumes computed by freesurfer and fsl on the same data (about 1800 cases). the intent of this figure is to show that consistency between our tools is missing even for very basic data elements. now compound these differences, say in a more complex workflow that uses said amygdala ROI in an autism study to look at genetics and fMRI integration. we are going to have a larger hyper-parameter space and evaluating implications of these different tools gets exponentially harder. therefore as a field, we have to move closer and closer towards SaMD (as @oesteban nicely puts it) and ask how do we quantify and establish the precision and accuracy of these devices, and validate them over time.

we don't always know what the "correct" answer is, but when we do (and as a community we agree we do) it is imperative that we quickly move to establish software that performs within those verifiable limits. we should be able to say, here is a validated workflow that measures amygdala volume within 5% of tolerance as measured via X (whatever we decide is the gold/silver standard).

jsheunis commented 4 years ago

@jsheunis any chance I can get write access? I just realised there's a typo "pipelines take on" -> "pipelines taking on", and I didn't mention who our panelists are!

It's Karolina Fing, Oscar Esteban, Satra Ghosh and Erin Dickie, right?

Fixed the typo and added the discussion participants.

oesteban commented 4 years ago

@jsheunis could we change the useful links to:

nipype: https://nipype.readthedocs.io/en/latest/
nipreps: https://www.nipreps.org/
niflows: https://github.com/niflows/
kfinc commented 4 years ago

Some interesting points that we can include in the discussion & I'm happy to add some thoughts:

Referring to Botvinik-Nezer et al. 2020:

  1. What's the role of open community-driven pipelines in reducing analytical flexibility, “researcher degrees of freedom”? Also referring to Poldrack et al. 2001 (Box 3 "Flexibility in functional MRI data analysis") and Carp 2012.
  2. "complex datasets should be analyzed using several analysis pipelines, and preferably by more than one research team. Achieving such ‘multiverse analysis’ on a large scale will require the development of automated statistical analysis tools" -- Can open pipelines provide support for multiverse analysis? - eg. FitLins, fMRIDenoise, PyNets.

Other points:

  1. How to start to contribute to an open pipeline? -- something that was difficult for me as a beginner, but thanks to great mentoring from @oesteban and attending sprints/brainhack events started to be less scary.
  2. How to prevent from applying open pipeline as a black box? -- The role of accessible documentation. Here I can add some points about our recent docusprint work on improving accessibility of fMRIPrep's documentation.