Closed alek0991 closed 2 years ago
$\gamma$ are the effects on the allele scale (e.g. from the marginals from GWAS or the joints you want to use for polygenic scores), while $\beta$ are the effects of the scaled genotypes, which is the scale I use internally in LDpred2.
I'll test your equation and get back to you.
If you forget about this approximation $$RS\gamma{joint} \approx S\gamma{marg}$$ you get $$cov(x,y)=\lambda{joint}^TSRS\gamma{joint}=\delta{joint}^TR\beta{joint}$$ where $\delta$ and $\beta$ are the scaled versions.
This is the formula I use to get $h^2_x=cov(x,x)$. For two different iterations, I get $r^2_x = \delta_1^T R \delta2$ (still need to justify why, but it works very well in practice). So, I think $\delta^T R \beta$ should give the co-predictability, and not the co-heritability, which will underestimate $r{xy}$.
Sometime ago, I tried instead using $$r_{xy} = \frac{\delta^T R \beta}{\sqrt{r_x^2 r_y^2}}$$ IIRC, it worked well in practice in case of complete sample overlap between GWAS of x and y, but not when there was no overlap.
Hi @alek0991
I do believe one should be able to account for sample overlap, e.g. by inferring it in the Gibbs sampler.
Best, Bjarni
Dear Florian and Bjarni, Thanks for your useful comments. I have been testing all these formulas and they are usually very similar when the sd_ldref and sd_ss are matching well.
However, I am working with pan-ukbb summary data which is mixed model using SAIGE and probably not the best summary data especially for binary traits with large-effect loci like red hair, bilirubin, and celiac disease.
As you have already mentioned in Fig. S15 of your recent paper, regions with high LD can correlate with PCs and create outliers in sd_ldref vs. sd_ss plot. It seems when a large effect loci overlaps with a high LD region the estimated joint effects become unreliable. I have already mentioned the celiac disease case here https://github.com/privefl/bigsnpr/issues/354#issue-1325122440 and I have seen similar pattern for red hair, bilirubin and a few other binary traits with large-effect loci. I should mention that for quantitative traits everything looks normal as far as I have noticed.
Yes, sumstats from LMM or GLMM are not recommended to use with LDpred2. My guess is that this is a problem for most sumstats-based methods.
Dear Florian, I have made a lot of progress thanks to your helpful responses to my questions in the past few weeks and I highly appreciate that. I am trying to calculate the genetic correlation between phenotypes and I was wondering if I can use the joint effect predicted by ldpred2 to calculate the genetic correlation between phenotype $x$ and $y$ like below: $$C_nx=CnG\lambda{joint}$$ $$C_ny=CnG\gamma{joint}$$ $$
~~~x^TCny=\lambda{joint}^TG^TCnG\gamma{joint}~~~(1)$$ From the ldpred2 paper we have: $$G^TCnG=(n-1)SRS$$ $$RS\gamma{joint}=S\gamma_{marg}$$ If we plug them into Eq. 1 we have: $$\frac{1}{n-1}x^TCny=\lambda{joint}^TS^2\gamma{marg} $$ Therefore, $$cov(x,y)=\lambda{joint}^TS^2\gamma{marg}$$ Probably due to approximations, estimated cov(x,y) and cov(y,x) are not necessarily identical. Therefore, the genetic correlation is estimated like below: $$r{xy} = \frac{cov(x,y)+cov(y,x)}{2\sqrt{cov(x,x)cov(y,y)}}$$