nipreps / fmripost-aroma

Run ICA-AROMA on fMRIPrep derivatives
https://fmripost-aroma.readthedocs.io/
Apache License 2.0
8 stars 2 forks source link

No clear way to ignore known dummy frames for ICA-AROMA, etc... #13

Closed alexlicohen closed 3 months ago

alexlicohen commented 6 years ago

We are processing the GSP 1000 dataset, which does have dummy frames in the fMRI data, but we want to use the ICA-AROMA output. It would seem more logical to remove these frames up front before MELODIC is run. Informal comparison finds distinct differences in fcMaps between leaving them in vs truncating the file before fmriprep is run. What should be the appropriate sequence of steps here?

oesteban commented 6 years ago

It seems to me like a very reasonable feature that we can add and set on by default. WDYT @chrisfilo @jdkent ?

effigies commented 6 years ago

So we already detect non-steady-state volumes. Would we be able to reuse that? And is that a sensible default behavior?

alexlicohen commented 6 years ago

My concern would be relying on the non-steady-state detector working equally well for all subjects. I think having a truncation based on that as a default would be good, but would also like a switch to explicitly (and always) drop the first 4, etc...

Ideally, we could leave the frames in, use them for registration, then drop them for the rest of processing.

effigies commented 6 years ago

That seems reasonable. My main concern in asking is whether we can avoid adding more options, when we don't need need to. Come to think of it, I wonder if there's anything in BIDS indicating the first non-dummy scan. It seems like the place to encode information like that is in the dataset, not in the command line arguments.

jdkent commented 6 years ago

Interesting problem!

My intuition was that ICA-AROMA would generate noise component(s) that were associated with the dummy frames and wouldn't (significantly) impact the other signal components, but this wouldn't be the first (nor the last) time my intuition was incorrect.

@alexlicohen When you compared the truncated versus non-truncated ICA-AROMA methods, you truncated the non-truncated data after ICA-AROMA, correct? I assume so, but I want to be explicit. I think I have some data that can play with truncating a different number of volumes and see if the components change.

As for the general process of truncation (outside the context of MELODIC/ICA-AROMA), I understand that this can be accomplished with adding columns to the design matrix with a "1" in the row that represents the volume you wish to ignore and "0"s for every other row. Given that is correct, you can effectively remove that volume without manipulating the data.

My concern for changing the number of volumes that are output by fmriprep is that it may change the interpretation of the events.tsv file if the fMRI data was constructed for a task. I like @effigies suggestion to incorporate the # of dummy volumes in the dataset itself

Thus if we decide that dummy volumes do negatively impact the generation of MELODIC components, then we could remove the volumes before MELODIC, run ICA-AROMA, and then either:

alexlicohen commented 6 years ago

Yes, we basically trimmed the file before fmriprep, or trimmed the file after the nonagress ICA-AROMA. We have also been trying truncating the non ICA-AROMA files (tsv and nii.gz) after fmriprep for comparison…

-Alex

On May 17, 2018, at 5:59 PM, James Kent notifications@github.com wrote:

Interesting problem!

My intuition was that ICA-AROMA would generate noise component(s) that were associated with the dummy frames and wouldn't (significantly) impact the other signal components, but this wouldn't be the first (nor the last) time my intuition was incorrect.

@alexlicohen https://github.com/alexlicohen When you compared the truncated versus non-truncated ICA-AROMA methods, you truncated the non-truncated data after ICA-AROMA, correct? I assume so, but I want to be explicit. I think I have some data that can play with truncating a different number of volumes and see if the components change.

As for the general process of truncation (outside the context of MELODIC/ICA-AROMA), I understand that this can be accomplished with adding columns to the design matrix with a "1" in the row that represents the volume you wish to ignore and "0"s for every other row. Given that is correct, you can effectively remove that volume without manipulating the data.

My concern for changing the number of volumes that are output by fmriprep is that it may change the interpretation of the events.tsv file if the fMRI data was constructed for a task. I like @effigies https://github.com/effigies suggestion to incorporate the # of dummy volumes in the dataset itself

Thus if we decide that dummy volumes do negatively impact the generation of MELODIC components, then we could remove the volumes before MELODIC, run ICA-AROMA, and then either:

keep the volumes removed (and hope the events.tsv accounts for this) add the unaltered dummy volumes back in and pad the aroma confounds in the confounds.tsv with the average of the component (or 0?) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poldracklab/fmriprep/issues/1136#issuecomment-390027146, or mute the thread https://github.com/notifications/unsubscribe-auth/AIPa_xygPpDoJLzvCsbDUAKvXtpak_-kks5tzfK-gaJpZM4UDrgJ.

chrisgorgo commented 6 years ago

We could do what we are doing already for compcor - see https://github.com/poldracklab/fmriprep/issues/361

effigies commented 6 years ago

Yeah, and I think @alexlicohen was saying that would be a reasonable default (speaking of which, is there a citation for AROMA or CompCor or similar algorithms performing more reliably with dummy scans removed?), but there should be a way of specifying the known number of dummy scans. I agree that's useful, but think it makes more sense to be encoded in BIDS metadata that we read. Is there such a field, or would you have any objection to proposing such a field?

alexlicohen commented 6 years ago

Yes, we've got two fields, albeit now I see there is a typo in spec 1.1.0 (the first one should be "...discarded by the scanner (as opposed to...)

NumberOfVolumesDiscardedByScanner: Number of volumes ("dummy scans") discarded by the user (as opposed to those discarded by the user post hoc) before saving the imaging file. For example, a sequence that automatically discards the first 4 volumes before saving would have this field as 4. A sequence that doesn't discard dummy scans would have this set to 0. Please note that the onsets recorded in the event.tsv file should always refer to the beginning of  the acquisition of the first volume in the corresponding imaging file - independent of the value of  NumberOfVolumesDiscardedByScanner field. 

NumberOfVolumesDiscardedByUser: Number of volumes ("dummy scans") discarded by the user before including the file in the dataset. If possible, including all of the volumes is strongly recommended. Please note that the onsets recorded in the event.tsv file should always  refer to the beginning of the acquisition of the first volume in the corresponding imaging file -  independent of the value of NumberOfVolumesDiscardedByUser field. 

chrisgorgo commented 6 years ago

Thanks for reporting the typo - I fixed it in the current draft.

Some questions/comments: 1) Do you have any examples where non-steady state detector did work correctly? 2) Have you tried adding the NonSteadyStateOutlierXX regressors from confounds.tsv to your temporal regression/cleaning model or using it for censoring data prior to estimating connectivity? This could be the easiest way to clean up fcMaps. 3) The NumberOfVolumesDiscardedByScanner and NumberOfVolumesDiscardedByUser BIDS fields describe how many volumes were removed (past tense) from data prior bidsification. This does not prescribe how many volumes should be ignored in preprocessing.

alexlicohen commented 6 years ago

I'll look into nipreps/fmriprep#1, and we haven't tried nipreps/fmriprep#2.

For nipreps/fmriprep#3, technically, if NumberOfVolumesDiscardedByScanner is zero, then we should be dropping the first frames. For modern sequences, this would be 4, etc... and thus you would know that the scanner dropped the frames already. A separate variable would have to control how many you WANT to discard (then subtract out how many the scanner already dropped, then how many the user excised from the DICOM).

chrisgorgo commented 6 years ago

Just to make it clear - I don't oppose adding this feature, but I would like to explore if what we already provide (via non-steady state detection) is sufficient. We iterated over the heuristic a couple of times, but if it is not robust enough there is indeed a need for manual option.

alexlicohen commented 6 years ago

I will look through our tsv files, to be clear, this is what we've tried so far:

(Using a subject from the GSP which all retain the nonsteadystate frames: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/25833)

GSP -> fmriprep, use ICA-AROMA output -> cut 4 frames -> correlation maps Vs GSP -> cut 4 frames -> fmriprep, use ICA-AROMA output -> correlation maps

There were obvious differences spatially and strength-wise in the maps.

We haven't yet tried (but I'm having our RA do this now): GSP -> fmriprep, use ICA-AROMA output -> GLM with tsv confounds (including the nonsteadystate column) -> correlation maps

We have also done: GSP -> fmriprep, use regular output -> cut 4 frames -> GLM with (whole brain, csf, WM, 6motion) or (aCompCor) tsv confounds -> correlation maps Vs GSP -> cut 4 frames -> fmriprep, use regular output -> GLM with (whole brain, csf, WM, 6motion) or (aCompCor)tsv confounds -> correlation maps
I can't recall if these were different when we looked at it earlier this week, given all of the variations, I'm asking now

but not:: GSP -> fmriprep, use regular output -> GLM with tsv confounds (including the nonsteadystate column) -> correlation maps

chrisgorgo commented 6 years ago

Just to clarify. There are many potential confounds in the confounds.tsv file (including ones necessary for performing aggressive ICA denoising). For your comparison I would only include the columns related to non-steady state.

How many non-steady state volumes are detected in your data? Are any of the ICA components picking on them? Are those components classified as noise?

alexlicohen commented 6 years ago

We usually get two columns:

I.e.:

NonSteadyStateOutlier00 NonSteadyStateOutlier01 aCompCor00 aCompCor01
1 0 0.000… 0.000…
0 1 0.000… 0.000…
0 0 0.076… -0.094…
0 0 0.034… -0.173…
0 0 -0.031… -0.111…
NonSteadyStateOutlier00 NonSteadyStateOutlier01 AROMAAggrComp01 AROMAAggrComp02
1 0 11.020… 10.795…
0 1 0.387… 1.941…
0 0 -0.038… 0.411…
0 0 -0.125… 0.201…
0 0 -0.105… 0.036…
NonSteadyStateOutlier00 NonSteadyStateOutlier01 CSF WhiteMatter GlobalSignal stdDVARS FramewiseDisplacement
1 0 669.692… 61.204… 163.751…
0 1 14.917… -2.558… 2.423… 11.573… 0.304…
0 0 -13.262… -2.348… -3.984… 1.422… 0.074…
0 0 -18.783… -2.356… -5.059… 0.963… 0.073…
0 0 -18.926… -1.407… -5.939… 0.898… 0.079…
chrisgorgo commented 6 years ago

These findings are consistent with my expectations: 1) outliers are one hot encoded to avoid any assumptions of relations between them 2) outliers are removed from data used to estimate compcor (hence zeros) 3) ICA is picking up on the outliers (but it will stop if we denoise the data the same way we do for compcor)

I think the main question is if there are cases where you have evidence of more non-steady state volumes than detected in data which would warrant the manual specification. If not we can just implement the pre ICA denoising using non-steady state detector outputs.

alexlicohen commented 6 years ago

One motivation is to try and replicate the output of an older (no longer supported) pipeline using fmriprep. In that pipeline, the first 4 frames were removed. Other than duplicating the entire dataset sans the first four frames, there is not a current way to do that in fmriprep...

I agree that it likely does the detection correctly, and steady state is achieved by frame three in our dataset, but it is not consistent across subjects and I think adding the option of specifically dropping a set numbers is a valid one.

(as a general default, I agree with your idea for using the nonsteadystate flags going into the ICA processing.)

alexlicohen commented 6 years ago

If not we can just implement the pre ICA denoising using non-steady state detector outputs.

@chrisfilo Can we go ahead and add this for now for testing purposes?

Per the original AROMA paper, they did this too:

Preprocessing of the training set was carried out using tools from the FMRIB Software Library (FSL; http://www.fmrib.ox.ac.uk/fsl; Smith et al. (2004), Woolrich et al. (2009), Jenkinson et al. (2012)) and involved (1) removal of the first five volumes to allow for signal equilibration, (2) head movement correction by volume-realignment to the middle volume using MCFLIRT, (3) global 4D mean intensity normalization and (4) spatial smoothing (6 mm FWHM). Importantly, no temporal filtering was applied at this stage of processing. Next, we applied ICA to the preprocessed participant-level data, using automatic dimensionality estimation as implemented in FSL MELODIC.

So the intended use of AROMA is for data that is already all steady state...

chrisgorgo commented 6 years ago

Sure - please send a pull request.

Best, Chris

PS Apologies for brevity and potential typos. This message was composed on a phone.

On Wed, May 23, 2018, 7:43 AM Alexander Li Cohen notifications@github.com wrote:

If not we can just implement the pre ICA denoising using non-steady state detector outputs.

@chrisfilo https://github.com/chrisfilo Can we go ahead and add this for now for testing purposes?

Per the original AROMA paper, they did this too:

Preprocessing of the training set was carried out using tools from the FMRIB Software Library (FSL; http://www.fmrib.ox.ac.uk/fsl; Smith et al. (2004), Woolrich et al. (2009), Jenkinson et al. (2012)) and involved (1) removal of the first five volumes to allow for signal equilibration, (2) head movement correction by volume-realignment to the middle volume using MCFLIRT, (3) global 4D mean intensity normalization and (4) spatial smoothing (6 mm FWHM). Importantly, no temporal filtering was applied at this stage of processing. Next, we applied ICA to the preprocessed participant-level data, using automatic dimensionality estimation as implemented in FSL MELODIC.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poldracklab/fmriprep/issues/1136#issuecomment-391373425, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOkp8cYlMX9cMI-wV51jJ-7oPa-Fuqpks5t1XWLgaJpZM4UDrgJ .

alexlicohen commented 6 years ago

Chris, I wish I could, but I'm not familiar enough with the fmriprep code to know where to change this. Can you point me in the right direction, and/or is someone else available to do this?

-Alex

chrisgorgo commented 6 years ago

I think the best way would be to add a regression step just before MELODIC https://github.com/poldracklab/fmriprep/blob/5f55944ad78512f9e5a86c2c65eda23bfdba23e1/fmriprep/workflows/bold/confounds.py#L484

You can reuse some of the filtering code from compcor

https://github.com/nipy/nipype/blob/a01d4b4e127f45e9a8e514c22de95c17aeb3bf6c/nipype/algorithms/confounds.py#L558

and

https://github.com/nipy/nipype/blob/a01d4b4e127f45e9a8e514c22de95c17aeb3bf6c/nipype/algorithms/confounds.py#L1128

All of this could be controlled by a command line switch which will alternate between removing detected non-steady volumes, a fixed number or none.

I hope this helps! If you open a PR early (as a work in progress) we can guide you more accurately.

oesteban commented 5 years ago

Hi @rciric, could you have a look into this discussion and make a suggestion as regards what we can do in this regard in the context of nipreps/fmriprep#1540 ?

tsalo commented 5 months ago

This is now built into the fMRIPost-AROMA CLI.