rpomponio / neuroHarmonize

Harmonization tools for multi-site neuroimaging analysis. Implemented as a python package. Harmonization of MRI, sMRI, dMRI, fMRI variables with support for NIFTI images. Complements the work in Neuroimage by Pomponio et al. (2019).
https://pypi.org/project/neuroHarmonize/
MIT License
79 stars 28 forks source link

Why data matrix should be N_samples x N_features #1

Closed anlijuncn closed 4 years ago

anlijuncn commented 4 years ago

Hi Pomponio, I am Lijun, a PhD student from NUS. I am trying to use neuroHarmonize package, and I am a little confused why data matrix should be N_samples x N_features. For example, if I am interested in ['hippocampus', 'ventricles'], and I have 1000 subjects from 2 sites, shouldn't I feed a 1000x2 data matrix? Thanks for your help!

rpomponio commented 4 years ago

Hi Lijun!

You are correct that the data matrix should be 1000x2, if you have 1,000 subjects in total and 2 features to harmonize. In this case the two features are hippocampus and ventricles.

In a separate data-frame, called "covars" in my example, you provide a column called "SITE" which indicates the site that each subject came from. The covars data will have 1,000 rows as well.

Hope this helps and happy to discuss further.

anlijuncn commented 4 years ago

Hi Pomponio,

Thanks for your reply! It helps me a lot, I am python user, so the ComBat python version is much easier for me to use. Excellent work for your ComBat-GAM!