omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
96 stars 22 forks source link

preparing input files for cS2G #143

Closed humanpaingeneticslab closed 1 year ago

humanpaingeneticslab commented 1 year ago

Hi Everyone,

I hope to find clues here on how to prepare input files for cS2G (PMID 35668300). That's a question for cS2G's authors, but one of them is also the author of polyfun (Dr. Weissbrod :-).

In the methods section, it is said "PolyFun + SuSiE estimates of posterior mean squared causal effect sizes for ~19,000,000 imputed SNPs with MAF >= 0.1% (b^2) estimated on N=337 K unrelated British UK".

An input file for cS2G is like this: zcat bp_SYSTOLICadjMEDz.UKBB.txt.gz | head -5 1 671810 1:671810_T_C 1.61839576529311e-08 1 741850 1:741850_C_T 1.70657544510731e-08 1 829333 1:829333_G_A 1.28516374085729e-08 1 858686 1:858686_G_A 4.28917761540782e-08 1 863641 1:863641_G_A 9.91911857818661e-08

So the 4'th column contains small values... these do not seem to be the squared of effect sizes from the GWAS' output.

Q1. What's the "PolyFun + SuSiE estimates of posterior mean squared causal effect sizes"?

Solved; cS2G needs E[b^2], so it's: E[b^2] = E[b]^2 + var[b] E[b^2] = BETA_MEAN^2 + BETA_SD^2

Q2. To prepare the input file for cS2G I need to run polyfun on the whole GWAS? I do it by windows, those from UKB's pre-computed LD matrices (https://alkesgroup.broadinstitute.org/UKBB_LD)? That's a whopping 3,000 windows! And I run polyfun with what parameters? Like that:

finemapper.py \ --ld chr1_46000001_49000001.npz \ --sumstats my_gwas.sumstats.txt.gz \ --chr 1 \ --start 46000001 \ --end 49000001 \ --method susie \ --max-num-causal 5 \ --...

Any help appreciated, Best to All.