rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
173 stars 52 forks source link

Normalisation and scaling of the phenotypes #515

Open mariamaitoumelloul opened 2 months ago

mariamaitoumelloul commented 2 months ago

Dear all,

I log-transformed my phenotype (metabolite level) then I compared the GWAS results between the log-transformed only and the log-transformed and scaled (mean of 0 , sd of 1 ). There was a big different that I would not have expected. Is there any hypothesis underlying regenie methodology that could explain this? manhattan_Idx_file_Idx_322_log_scaled regenie manhattan_Idx_file_Idx_322_log regenie

Ojami commented 2 months ago

There (almost) always is a difference between summary stat distributions under different transformations. The best way to diagnose the issue would be to check the inflation (QQ and lambda) or try checking the intercept from LDSC regression analysis. There are other factors in play too (e.g., sample size) which may explain this difference. Usually, metabolite/protein levels are highly skewed (warning: OLS has no assumptions on the distribution of the outcome per se but the errors), you can fit a simple OLS on few common variants (maybe MAF > 0.3?) and check the diagnostics of your model under different transformations.

joellembatchou commented 1 month ago

Hi,

Can you include the code showing which transformations you are comparing?

If the comparison is between a phenotype Y1 and a phenotype Y2=(Y1-mean(Y1))/sd(Y1), then the p-values should be about the same and the beta/SE would differ due to the different scaling. If the comparison is between a standardized phenotype Y1 and the log-transformed Y1 then it would not be surprising for the p-values to differ as @Ojami pointed out.