weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

is it completely necessary to include principal components in the analysis?? #331

Closed corunesa closed 3 years ago

corunesa commented 3 years ago

Hello! I am writing because I wanted to know if it is completely necessary to include principal components in the regression. I understood that saige uses LMM and reading about this kind of models, I understood that it was controversial to add them, although in the SAIGE article they do. Also, in my case, PC1 explains 1-4% of the variance, and is not associated with my phenotype of interest in a multivariate regression with non-genetic data. Could someone help me? Thank you very much!

Translated with www.DeepL.com/Translator (free version)

rkarlssonlinner commented 3 years ago

Hi,

It is recommended to include genetic PCs also when estimating LMM because the exogeneity assumption of the random effects model can be violated, and because population stratification can be considered a fixed effect rather than a random effect. See e.g., Price et al. (2010): https://www.nature.com/articles/nrg2813

"However, population structure is actually a fixed effect (that is, its effect as a function of genetic ancestry is the same for all samples), and spurious associations might result if it is modelled as a random effect based on overall covariance, particularly in the case of unusually differentiated markers. Modelling population structure as a fixed effect provides a higher level of certainty in correcting for stratification but requires running PCA (or a similar method) to infer the genetic ancestry of each sample."

corunesa commented 3 years ago

Thank you very much, I understand :)

Mahmedturk commented 2 years ago

do we add the PCs as co-variates in the phenotype file?

rkarlssonlinner commented 2 years ago

"do we add the PCs as co-variates in the phenotype file?" That is how I typically do it.