Closed alek0991 closed 2 years ago
This can happen due to LD; see e.g. fig S5 of https://doi.org/10.1016/j.ajhg.2019.11.001.
Quick simu showing this:
N <- 1000
set.seed(1)
X <- MASS::mvrnorm(N, rep(0, 2), matrix(c(1, 0.5, 0.5, 1), 2))
y <- X %*% c(-1, 3) + rnorm(N)
lm(y ~ X[, 1]) # marginal effect is positive due to correlation with 2nd column
lm(y ~ X) # joint effect is negative
But there is a quite strong pattern for large effects, so there may be something odd there.
Maybe there is a problem of fitting with the multi-ancestry sumstats, try smaller values for shrink_corr
(e.g. 0.5) to add more regularization.
I only use the summary stats for the EUR population of the pan-ukbb.
I have been playing with shrink_corr. The pattern doesn't change for shrink_corr=0.5. However, it start changing when shrink_corr<=0.1. This (shrink_corr<=0.1) is too much regularization, right?
Please note for the following plot (not the previous ones) I ran ldpred2-auto only for chr6. RED: MHC variants. BLUE: non-MHC variants.
shrink_corr = 0 means that you assume that there is absolutely no LD between variants, which is bad.
You should check frequencies with another reference I guess.
As I have mentioned above, ldpred2 reverses the sign of beta of some key variants in the MHC region that leads to a downstream result which is the opposite of P+T. In the following figure you can see that variants with reversed beta have higher sd_ldref/sd_ss.
P.S.
allow_jump_sign
, shrink_corr
, per chromosome, per chromosome chunk, and etc. However, ldpred2 stays robust for this specific phenotype (and some other correlated phenotype). P+T just uses the effects from the GWAS, so these are just the marginal effects.
To be sure, I would just redo the GWAS in UKBB.
I have noticed the z score of the heritability estimate (z_h2 = h2/h2_se) is not significant (large variance). What is the minimum acceptable h2 and h2_z to be able to use the ldpred2?
LDSc regression tends to be bad for traits with small polygenicity, so do not focus too much on that.
You can always use LDpred2, the worst that can happen is that you PGS has no predictive power.
Hi, I have been able to estimate the beta-auto using summary statistics from pan-ukbb with the LD-reference provided in the ldpred2 paper. For the Celiac disease which is a binary phenotype, sign of the estimated beta-auto for some variant in MHC region is the opposite of GWAS estimate (bottom-right corner of Figure 1). This is important for my work as I am comparing MHC and non-MHC region specifically. Does this make sense? If not, is there anyway I can fix this?
You can download relevant files from below in case needed. Download sumstats.csv Download multi_auto.rds
Also, I have attached similar plots for height and skin color for the sake of comparison.
Thank you! Ali
Figure 3: Celiac disease
Figure 2: Standing Height
Figure 3: Skin color