rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
185 stars 55 forks source link

Smart covariates selection procedure -- enhancement #246

Closed Shicheng-Guo closed 2 years ago

Shicheng-Guo commented 2 years ago

Hi Joelle,

In large scale pheWAS analysis, the covariates selection is a challenge. For example, for majority traits, gender should be adjusted, but for gender specific traits, like breast cancer, it is not a good idea to take gender as a covariate. I am wondering is there any smart way to achieve the automatic selection for such specific case in a large-scale phewas analysis?

Thanks

Shicheng

joellembatchou commented 2 years ago

The REGENIE method includes a pre-processing step which essentially identifies and remove covariates which are perfectly correlated (or with the intercept). So for a covariate like breast cancer where Sex is the same across individuals, Sex would be removed as it would be perfectly correlated with the intercept.