privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Scales of polygenic risk score distributions #321

Closed johannfaouzi closed 2 years ago

johannfaouzi commented 2 years ago

Hi Florian,

I have a question regarding the scales of the polygenic risk score distributions computed for a given population.

In short, I have computed PRS for several phenotypes and I'm interested in their potential association with measured phenotypes. To do so, I use linear mixed effects models (I have longitudinal data) with one model for each PRS, and I check the sign and the p-value of the coefficient for the PRS.

Now I'm wondering if the actual value of the coefficient is interpretable, because the p-value only indicates the significance but not the effect size. I have computed all the PRS using the same approach (LDpred2-auto) and I have noticed that the scales for different PRS may be very different.

So I'm wondering if the scales of PRS distributions (for a given population) are comparable between different PRS or not for whatever reasons (summary statistics, chains used to compute the final PRS, etc.).

Best, Johann

privefl commented 2 years ago

This is not something I have checked, so I don't know, sorry.

johannfaouzi commented 2 years ago

No problem, thank you for your reply!

johannfaouzi commented 2 years ago

I just checked the estimated standard deviations of the phenotypes with

sd_phenotype <- min(with(info_snp, beta_se * sqrt(n_eff) * sqrt(0.5)))

and the values are quite different. I'm going to check if, after dividing the PRS by the estimated standard deviation of each phenotype, the scales are more similar.

privefl commented 2 years ago

My guess would be that the SD of the PGS is also proportional to the R^2.

johannfaouzi commented 2 years ago

Capture d’écran 2022-03-07 à 12 04 31

On a log-log scale the plot is quite linear, but the coefficient (thus the power) is a bit far from 1 (around 0.9).

Capture d’écran 2022-03-07 à 12 08 30

In the linear scale there are quite a few outliers, which can be seen when I compute the 5 quartiles of the ratio distribution (0.001040, 0.114378, 0.158402, 0.244434, 0.936943).

I may give up on this for the moment.

privefl commented 2 years ago

Can you color by R^2 maybe?

johannfaouzi commented 2 years ago

By R^2 you mean the SNP heritability reported in the GWAS (or estimated by the ldsc function)?

privefl commented 2 years ago

Not h^2, but R^2, the variance explained by the PGS. But you can also compare with h^2.

johannfaouzi commented 2 years ago

I'm computing mostly PGS for phenotypes that are not measured (it's an exploratory analysis), so I think that I can't compute the variance explained by the PGS.

johannfaouzi commented 2 years ago

Capture d’écran 2022-03-07 à 18 31 45

Here is the scatter plot with the h^2 values used as colors.