omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

missing SNPs when performing polyfun #157

Closed JonnyBaseball closed 1 year ago

JonnyBaseball commented 1 year ago

Hi,

Thank you for this awesome package! I'm new to finemapping and got your tutorials to work. I was trying out the finemapping on a GWAS that I generated with the 1000G EUR LD reference panel. When I try using polyfun with the same panel, I get messages for each loci that state (from the log):

[WARNING] 9016 variants with sumstats were not found in the LD file and will be omitted (please note that this may lead to false positives if the omitted SNPs are causal!)

It seems to be performing fine mapping with ~1000 SNPs per region, but I'm unsure as to why there is the missing SNPs given the LDREF and GWAS are identical.

My meta-analyzed data has a mix of UKB participants but also other EUR cohorts. Is there a recommended reference for this case?

Thank you, Dan

Here's the full log below: [INFO] LD_LIBRARY_PATH: [DEBUG] cffi mode is InterfaceType.API [INFO] Default options to initialize R: rpy2, --quiet, --no-save [INFO] R is already initialized. No need to initialize. [WARNING] The available cached LD file was ignored because it does not contain data for all the SNPs in the locus [INFO] Computing LD from plink fileset ./LDREF/1000G.EUR.1 chromosome 1 region 108127957-112004663 [INFO] Found 1879 SNPs in target region. Computing LD in 1 chunks... [INFO] Done in 0.27 seconds [INFO] Saving LD file LD_cache/1000G.EUR.1.1.108127957.112004663.npz [INFO] Done in 0.48 seconds [WARNING] 9016 variants with sumstats were not found in the LD file and will be omitted (please note that this may lead to false positives if the omitted SNPs are causal!) [INFO] Flipping the effect-sign of 497 SNPs that are flipped compared to the LD panel [INFO] Starting functionally-informed SuSiE fine-mapping for chromosome 1 BP 108127957-112004663 (1805 SNPs) [INFO] Done in 1.37 seconds [INFO] Writing fine-mapping results to ./ireur.finemap.output/ireur.finemap.1.108127957.112004663.gz

omerwe commented 1 year ago

@JonnyBaseball I think you're asking two different questions:

  1. I'm not sure why some SNPs in your sumstats fils are reported as missing in your LD file. You could see which SNPs these are (by inspecting your fine-mapping results), this may help you figure out what's going on. If you can create a small reprodudicble example, I could try taking a quick look.

  2. Our general recommendation is to be extremely conservative with fine-mapping, and especially with fine-mapping of meta-analyzed studies. Even if your LD reference matrix is even slightly out of sync with the cohort from which you generated your sumstats, you could get extremely biased results (see Table 3 in the PolyFun paper for examples and details). You're of course welcome to carry out this analysis, but the results should at least be taken with a grain of salt. Sorry I can't be more positive about this...

omerwe commented 1 year ago

Hi, I'm closing this for now, please reopen if you have further questions. Thanks!