rpomponio / neuroHarmonize

Harmonization tools for multi-site neuroimaging analysis. Implemented as a python package. Harmonization of MRI, sMRI, dMRI, fMRI variables with support for NIFTI images. Complements the work in Neuroimage by Pomponio et al. (2019).
https://pypi.org/project/neuroHarmonize/
MIT License
79 stars 28 forks source link

Correcting for site as well as the effect of covariates on NIfTI images #10

Closed parekhpravesh closed 3 years ago

parekhpravesh commented 3 years ago

Hello,

I want to use neuroHarmonize on NIfTI images to simultaneously correct for site and the effect of covariates (TIV, age, and sex). I have followed the examples on the intro page to read my data in and create a mask. To learn the model parameters, the example mentions:

my_model, nifti_array_adj = nh.harmonizationLearn(nifti_array, covars)

However, since I also want to correct for additional covariates, I used

my_model, nifti_array_adj, s_data = nh.harmonizationLearn(nifti_array, covars, return_s_data=True)

After that, I saved the model and applied the same to the same set of files from which I learned the model parameters. However, the resulting NIfTI images do not look corrected for the effect of covariates; rather, there seems to be only a small level of change in voxel intensities (which I assume are the site effect).

Therefore, my question is: shouldn't we be writing out s_data because that is corrected for both site as well as the effect of covariates? I did try this and the resulting NIfTI images look like residuals images typically obtained after linear regression (which seems reasonable).

Thank you for your time and help!

rpomponio commented 3 years ago

Hello!

I'm appreciative of your questions and willingness to use the neuroHarmonize package! It seems like an interesting use case that I did not originally consider, and I have a few comments:

Please keep me updated with your ideas and progress. I'm very interested in your experience with the software and if necessary, I would consider modifying the package for a new use case in the future.

-Ray

parekhpravesh commented 3 years ago

Hello,

Thank you for your reply!

To give you some more context, we have imaging data from three different sites/scanners. The goal is to perform a VBM analysis between two groups of patients (chronic and recent patients). There is, however, a strong correlation between age and duration of illness (which is the variable of interest). Therefore, our plan is to estimate the effect of age from healthy subjects and control for the effect of age in the patient sample (by applying this model). Since, site/scanner, TIV, and sex are also going to account for some of the variance in the GM, we thought of using neuroHarmonize to learn how all these variable affect GM in a single model and then apply it to patient data. The "residual" images after applying the model would then be free of the confounding effect of these variables and then can be used for further analyses.

I will try a few of these permutations and reply with the results. Appreciate your time and help!

Regards Pravesh

parekhpravesh commented 3 years ago

Hello, here are some observations from various experiments:

Look forward to your thoughts. Thank you

Regards Pravesh

rpomponio commented 3 years ago

Hi Pravesh,

Glad to see you are progressing with the package! And I believe this may help to clarify some your points of inquiry:

parekhpravesh commented 3 years ago

Hello,

Yes, indeed. I am now using the third option as the main pipeline and everything seems to be working fine.

With regard to masking, I have been using a liberal threshold of zero and later for second-level analyses applying a threshold. The reason for this being that the mask I want to use is different from the mask which is created by neuroHarmonize. Hopefully the differences should be subtle enough to not matter too much.

Regarding the last point, the rationale is to keep strict independence between training and test set during cross-validation. Instead of using the entire data for harmonization, it would be better to estimate the harmonization model parameters from the training set and then apply it to test set (otherwise there is information leakage between training and test set). Of course, one could argue that this information leakage is not directly relevant to target variable but I think in general it is best to maintain strict separation between these two sets. Of course, the downside is that its computationally much more expensive now as for every cross-validation fold, a new harmonization model has to be learnt and then applied to the test set.

rpomponio commented 3 years ago

This was a great discussion and I welcome re-opening the issue if there are additional questions!