omerwe / polyfun

PolyFun (POLYgenic FUNctionally-informed fine-mapping)
MIT License
85 stars 21 forks source link

Credible set missing and variant duplicates #141

Closed deniseomahony01 closed 1 year ago

deniseomahony01 commented 1 year ago

Hi Omer, I have used PolyFun to compute prior causal probabilities for functionally-informed fine-mapping on summary statistics data using UK-Biobank LD matrices and custom functional annotations. After running the finemapper.py script using FINEMAP and SuSiE, credible sets were constructed.

In a couple of regions however, some of the credible sets are missing, e.g. credible set '2' is missing from the output below. Note that this only happens for FINEMAP. When I fine-mapped the same regions using SuSiE no such issue occurred.

SNP CHR BP A1 A2 PIP BETA_MEAN BETA_SD DISTANCE_FROM_CENTER CREDIBLE_SET
rs2981578 10 1.23E+08 C T 1.00E+00 2.02E-02 1.84E-02 118734 1
rs2936871 10 1.23E+08 A T 1.00E+00 2.02E-02 1.84E-02 134598 3
rs45631563 10 1.23E+08 A T 1.00E+00 2.02E-02 1.84E-02 127747 4
rs7919434 10 1.23E+08 G A 5.42E-01 1.12E-02 1.71E-02 135361 5
rs7919566 10 1.23E+08 G A 3.12E-01 6.18E-03 1.38E-02 135412 5
rs4647913 10 1.23E+08 C G 1.45E-01 2.78E-03 9.66E-03 135837 5
rs4752569 10 1.23E+08 A T 1.00E+00 2.02E-02 1.84E-02 110113 6
rs4751843 10 1.23E+08 T A 1.00E+00 2.02E-02 1.84E-02 109891 7
rs1047111 10 1.23E+08 C T 1.00E+00 2.02E-02 1.84E-02 135984 8
rs1047111 10 1.23E+08 C G 1.00E+00 2.02E-02 1.84E-02 135984 8
rs4751842 10 1.23E+08 A T 1.00E+00 2.02E-02 1.84E-02 109890 9
rs4752568 10 1.23E+08 C T 2.66E-01 1.15E-02 1.93E-02 108964 10
rs11199913 10 1.23E+08 A T 2.39E-01 4.34E-03 7.81E-03 129191 10
rs4752537 10 1.23E+08 A G 1.69E-01 3.05E-03 6.82E-03 128656 10
rs9420318 10 1.23E+08 G A 1.55E-01 2.79E-03 6.55E-03 128395 10
rs10788169 10 1.23E+08 C G 1.27E-01 -2.29E-03 6.05E-03 129189 10

A second issue I have noticed is that in some regions, the credible sets created contained variant duplicates (see example below). For example in the output below the variant rs142013821 appears both in Credible Set 2 and Credible Set 3. Note that this was true for both FINEMAP and SuSiE.

CHR SNP BP A1 A2 SNPVAR MAF Z N P PIP BETA_MEAN BETA_SD DISTANCE_FROM_CENTER CREDIBLE_SET
2 rs6721996 2.18E+08 A G 8.12E-09 5.00E-01 -2.19E+01 245619 1.03E-106 1.00E+00 5.33E-02 1.97E-03 192138 1
2 rs142013821 2.18E+08 C CTTAT 2.10E-08 3.22E-02 -7.05E+00 245619 1.85E-12 1.00E+00 -7.77E-01 3.42E-03 190412 2
2 rs142013821 2.18E+08 CTTAT C 1.32E-08 4.06E-01 -9.91E+00 245619 3.85E-23 1.00E+00 -3.10E-01 2.79E-03 190412 3
2 rs7600412 2.18E+08 C T 1.01E-07 3.60E-01 -1.32E+01 245619 1.08E-39 1.00E+00 4.59E-01 7.11E-03 183030 4
2 rs2372945 2.18E+08 C A 1.32E-08 3.59E-01 -1.28E+01 245619 8.83E-38 1.00E+00 2.70E-01 6.83E-03 187489 5
2 rs113510705 2.18E+08 C T 4.06E-08 2.93E-02 -3.25E+00 245619 1.15E-03 1.00E+00 -5.96E-02 1.97E-03 295071 6

Could you please share some thoughts/explanations on the above issues? Thank you.

omerwe commented 1 year ago

Hi @deniseomahony01,

Re the first problem: This sounds like a FINEMAP issue, so I'm afraid I can't help with that. You may want to reach out to the author of FINEMAP...

Re the second problem: These are in fact not duplicate SNPs. These are two duplicate SNPs that (unfortunately) share the same rsid. The first SNP represents an insertion event, and the second represents a deletion event (this can be seen in the fields A1 and A2).

Sorry I can't be of more help, and please let me know if there's something else I can help with.

deniseomahony01 commented 1 year ago

Thank you for your response. I shall contact the relevant person. As for the second issue, apologies I didn't realise this was an indel. Thank you again.