Closed jumentib closed 1 year ago
@jumentib, the difference between the BOLT-LMM sumstats and causal effects are because we excluded all effects that are zero.
The short answer is that you can use any set of summary statistics you want for any method. As a rule of thumb, including more SNPs will lead to a better accuracy. The main gist of PolyFun is that you need to include all SNPs with MAF>0.1% to estimate true causal effects. For other methods (e.g. SBayesR) people usually use a smaller set of SNPs (around 1 million). I believe that all of this information is provided in the Methods section of either the PolyFun paper or the PolyPred paper (or both).
@jumentib can I close the issue?
Hello,
Yes, thank you very much for your answer !
Dear Dr Weissbrod,
I want to use PolyPred to compute trans-etnics PRS. Since I don't have access to individual-level training data, I need to use PolyPred-S. As you explain in your paper, I have to use stat sums to estimate the "causal effect" with Polyfun and the "tagging effect" with SBayesR. And my question is: can we use the same sums stat for Polyfun and SBayesR?
I see in your tutorial (for PolyPred) that the dimensions between the sums stat used for Polyfun (bolt.sumstats.txt.gz, 80K SNPs) and the result of BOLT-LMM (bolt.betas.gz, 8K SNPs) are different, hence my question.
Sorry for this question, I'm new to the genetic field, and still a bit lost.
Thanks in advance,
Basile Jumentier