nipreps / dmriprep

dMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.
https://www.nipreps.org/dmriprep
Apache License 2.0
64 stars 24 forks source link

Points of discussion #7

Closed dPys closed 4 years ago

dPys commented 5 years ago

I'm making a number of changes on a local fork and am wondering what others thoughts are on the following:

1) Omitting freesurfer reconstruction entirely from dmriprep for now (particularly since it creates a huge docker image, takes awhile to run, and doesn't add much to dmri preprocessing)? 2) Along the lines of (1), keeping preprocessing in native diffusion space. That is, no registration (beyond intramodal b0-b0). It seems like dmriprep already adheres to this but I'm curious to hear others' opinions. 3) Minimizing the number of external dependencies with the exception of FSL (i.e. using dipy wherever mrtrix/ANTs might be used, etc.) 4) Minimizing the number of independent workflows unless they orchestrate truly independent routines (i.e. generating a fieldmap makes sense as a separate workflow, but why eddy since that's run every time with the main workflow?) 5) Considering omission of tensor model estimation altogether since it is not technically a preprocessing step-- it is a reconstruction step.

Mainly just curious to hear whether these principles are shared by others, and if not, please feel free to voice disagreement! :-)

@dPys

oesteban commented 5 years ago
  1. Omitting freesurfer reconstruction entirely from dmriprep for now (particularly since it creates a huge docker image, takes awhile to run, and doesn't add much to dmri preprocessing)?

You have the --fs-no-reconall option. Not sure the Docker image is that substantially bigger - or whether size of the image is really relevant (5GB).

2. Along the lines of (1), keeping preprocessing in native diffusion space. That is, no registration (beyond intramodal b0-b0). It seems like dmriprep already adheres to this but I'm curious to hear others' opinions.

Totally with you here in that idea of keeping processing in native diffusion space. I don't see it aligned with 1) though, prior information is still necessary and worth bringing in where needed.

3. Minimizing the number of external dependencies with the exception of FSL (i.e. using dipy wherever mrtrix/ANTs might be used, etc.)

Actually FSL is possibly the most annoying dependency for their noncommercial license... I think one of the successes of fMRIPrep was to provide implementations for the software people were using across packages. I think we should optimize with respect to results (accuracy/speed tradeoff), not minimizing the number of tools.

4. Minimizing the number of independent workflows unless they orchestrate truly independent routines (i.e. generating a fieldmap makes sense as a separate workflow, but why eddy since that's run every time with the main workflow?)

With the development of fMRIPrep we realized that modularity was crucial, and have been working on splitting out functionality. Not only to allow other tools (e.g., dMRIPrep) to pull from that (without having the monster of fMRIPrep as a dependency), but also to make benchmarking faster. If your units can easily swap several implementations, then comparing those is a lot easier.

You might want to run eddy every time, but that assumption might change if dMRIPrep evolves we find better alternatives. At that point, you'll want to switch fast.

5. Considering omission of tensor model estimation altogether since it is not technically a preprocessing step-- it is a reconstruction step.

+1000 totally agree with this one.

josephmje commented 5 years ago

My only argument for keeping tensor model estimation is that generating a colour FA map and plotting the tensor residuals can be really good QC visuals as seen here. We've found that the residual plots allow you to see slice dropouts at a quick glance. But I agree, you might just want to generate these for QC and not output the actual niftis.

dPys commented 5 years ago
  1. Omitting freesurfer reconstruction entirely from dmriprep for now (particularly since it creates a huge docker image, takes awhile to run, and doesn't add much to dmri preprocessing)?

You have the --fs-no-reconall option. Not sure the Docker image is that substantially bigger - or whether size of the image is really relevant (5GB).

I think with --fs-no-reconall, that should be okay. Any literature that you'd recommend on the incorporation of surface information in dmri preprocessing? I've seen it in tractography, but not in preprocessing.

  1. Along the lines of (1), keeping preprocessing in native diffusion space. That is, no registration (beyond intramodal b0-b0). It seems like dmriprep already adheres to this but I'm curious to hear others' opinions.

Totally with you here in that idea of keeping processing in native diffusion space. I don't see it aligned with 1) though, prior information is still necessary and worth bringing in where needed.

This is interesting and may speak to a use case for surfaces. Still, I'm curious-- how do you envision that surface priors might inform preprocessing? e.g. one thing that I've thought of but never seen implemented is using a t1w white-matter tissue edge map to perform boundary-based registration of diffusion volumes to one another for motion correction.

  1. Minimizing the number of external dependencies with the exception of FSL (i.e. using dipy wherever mrtrix/ANTs might be used, etc.)

Actually FSL is possibly the most annoying dependency for their noncommercial license... I think one of the successes of fMRIPrep was to provide implementations for the software people were using across packages. I think we should optimize with respect to results (accuracy/speed tradeoff), not minimizing the number of tools.

Yes, but this raises the further issue of how dmriprep differs from qsiprep, which I like for DSI preprocessing. I do agree though with the primary objective of optimizing with respect to results and not minimizing the number of tools. That being said, I see no reason to use external dependencies where pure python equivalents are available? As a secondary aim, why not strive for a lightweight, predominantly pure-pythonic tool wherever viable. (e.g. dipy's denoising routines in place of DWIDenoise() fro mrtrix)

  1. Minimizing the number of independent workflows unless they orchestrate truly independent routines (i.e. generating a fieldmap makes sense as a separate workflow, but why eddy since that's run every time with the main workflow?)

With the development of fMRIPrep we realized that modularity was crucial, and have been working on splitting out functionality. Not only to allow other tools (e.g., dMRIPrep) to pull from that (without having the monster of fMRIPrep as a dependency), but also to make benchmarking faster. If your units can easily swap several implementations, then comparing those is a lot easier.

You might want to run eddy every time, but that assumption might change if dMRIPrep evolves we find better alternatives. At that point, you'll want to switch fast.

This is true. I do think there is a tradeoff between pipeline flexibility and development readability, but modularizing in places where there are deliberate reasons for doing so makes sense to me.

  1. Considering omission of tensor model estimation altogether since it is not technically a preprocessing step-- it is a reconstruction step.

+1000 totally agree with this one.

I love this discussion. I think it's super helpful to think through these decisions and bring everyone on the same page before digging in too much into development.

@dPys

dPys commented 5 years ago

My only argument for keeping tensor model estimation is that generating a colour FA map and plotting the tensor residuals can be really good QC visuals as seen here. We've found that the residual plots allow you to see slice dropouts at a quick glance. But I agree, you might just want to generate these for QC and not output the actual niftis.

I can see an argument for generating these for QC as opposed to outputting the actual niftis. One thing some folks like to do is use plots of the primary eigenvectors overlaid on the preprocessed dwi to ensure directional information corresponds to fascicle neuroanatomy. We could even implement something like this in a similar manner to the live html plots generated from fmriprep...

arokem commented 5 years ago

I can see an argument for generating these for QC as opposed to outputting the actual niftis. One thing some folks like to do is use plots of the primary eigenvectors overlaid on the preprocessed dwi to ensure directional information corresponds to fascicle neuroanatomy. We could even implement something like this in a similar manner to the live html plots generated from fmriprep...

+1. People like to look at these for sanity checking. It provides a lot of information about whether the input data was erroneous (e.g., flipped gradient directions) or something went wrong during processing.

oesteban commented 4 years ago

We've moved this conversation to the google docs - @dPys can I ask you to check whether all the points you brought up here are reflected in the document?

https://docs.google.com/document/d/1d2oAy5umm9FFoxJVusCJNmIajlYNrBssX182jO-2k1o/edit?usp=sharing