nipreps / fmriprep

fMRIPrep is a robust and easy-to-use pipeline for preprocessing of diverse fMRI data. The transparent workflow dispenses of manual intervention, thereby ensuring the reproducibility of the results.
https://fmriprep.org
Apache License 2.0
635 stars 293 forks source link

Transforms-only and apply-transforms modes #2207

Open effigies opened 4 years ago

effigies commented 4 years ago

For very large datasets (10k+ subjects), the cost of storing even a small minimal set of derivatives for each subject can become large. Internally, we delay transforming data in order to do as much as possible in a single shot, reducing interpolations. It should therefore not be very difficult to output all transforms and few if any other derivatives with something like a --transforms-only flag. The user can then construct the needed volumes and time series on the fly, or we could provide an --apply-transforms mode to fully populate a subject directory.

This would be enabled by the X5 transform format, allowing us to store the head-motion-correction transforms for an entire series as a step in a chain from BOLD to template space. I'm not sure if there's an existing format that something like antsApplyTransforms could use; we currently split, apply, and merge.

I list this as medium impact. I think it would be low value for moderately sized datasets, but extremely valuable for very large datasets.

cc @shotgunosine @mih for thoughts.

Shotgunosine commented 4 years ago

Yeah, I think this would be really helpful for efforts to share fmriprep derivatives in the case that people will be downloading that data and running subsequent processes themselves. If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step? Could also be tricky with multi-echo.

effigies commented 4 years ago

If the use case is that subsequent processes will happen in the cloud, the benefit will depend on the processing requirements of the apply transforms operation.

It seems likely you'll want some level of caching, but

Off the top of my head, the only step where this might not work is slice-time correction. @effigies is that also handled with transformations that end up applied in a single step?

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Could also be tricky with multi-echo.

I suspect the combination could be represented as a voxel x echo weight matrix, which would make it not too far from a displacement field. But that's a guess based on a very qualitative understanding of ME.

oesteban commented 4 years ago

Yeah, STC is done separately, but should be deterministic. I don't think there's any fundamental reason that STC couldn't be included as part of the X5 chain, but transforms with temporal components might (apart from one transform per time point) not be specified yet.

Including STC at once would be really nice - but at this point, I see it very far in the future. To be able to include it directly in the resampling we would need to have a way of interpolating through time too, which is not currently available through scipy (and I honestly don't know the interpolating kernel you should use right this minute).

Lestropie commented 2 years ago

Posting to register personal investment in this functionality, and to hopefully draw an update on what would be required to contribute given any changes to transformation handling that have happened since initial posting of the issue.