xinhe-lab / mapgen

R package to perform gene mapping using functionally-informed genetic fine mapping
https://xinhe-lab.github.io/mapgen/
Other
3 stars 2 forks source link

Hight Posterior Inclusion Probability In Gene Mapping( PIP >1) #6

Closed 1667857557 closed 5 months ago

1667857557 commented 9 months ago

Dear Sir or Madam,

Thank you for your excellent work. We encountered an issue while running the gene mapping module using the GWAS tool. Even with the default settings in MapGen, we noticed that some of the identified genes have large posterior inclusion probability (PIP >1). Is this a common result? Below is the header of our output file We appreciate your response in advance.

  top_gene top_locus_gene_pip top_gene_pip
1 ARHGEF16              0.677        0.689
2 C1orf174              1.426        1.428
3    AJAP1              3.015        3.574
4   CAMTA1              1.321        1.366
5  SLC45A1              1.159        1.159
6 SLC25A33              0.097        0.106

Yu-Feng Huang

kevinlkx commented 9 months ago

Thanks for your question. Yes, gene PIP could exceed 1. The gene PIP of a gene is a weighted sum of PIPs of all SNPs linked to the gene. In some cases, a gene could span two nearby LD blocks (or linked to multiple causal signals), then its gene PIP could be larger than 1, which could be interpreted as the expected number of causal variants targeting the gene.

In your case, what is your maximum L (number of casual signals) in fine-mapping? Do you have L > 1?

1667857557 commented 9 months ago

Hi Dr.Luo

Thanks for your reply, yes, here is our code,

susie.res <- run_finemapping(sumstats = gwas.sumstats, 
                             bigSNP = bigSNP, 
                             priortype = 'uniform', 
                             n = 24009,
                             L = 5)

we set the L = 5, the max number of susie

kevinlkx commented 9 months ago

Thanks. As you have L = 5, it is possible to get genes with gene PIP > 1, which could be interpreted as the expected number of causal variants targeting the gene.

1667857557 commented 9 months ago

Hi Dr. Luo,

Thanks for your reply. Is it reasonable to use such a large value of L to prioritize the genes? Should I consider using a smaller value of L, such as 1, in the dataset? Additionally, we identified more than 600 significant genes (PIP > 0.8) in gene mapping. Would it be better if we selected the genes with PIP > 0.8 in the locus that identified the significant SNP (P < 5×10^-8) previously, rather than including all genes with PIP > 0.8?

Yu-Feng Huang

kevinlkx commented 9 months ago

One caution is that the inconsistencies between GWAS z-scores and the LD reference could potentially inflate PIPs from SuSiE finemapping, especially when you have a large value of L. The inflated PIPs from finemapping could result in many genes with high PIPs.

So to be conservative, you could use L = 1 if you have "out-of-sample" LD reference, and focus on the loci with significant GWAS signals (P < 5×10^-8).

kevinlkx commented 9 months ago

It would be helpful to check potential LD mismatch issue. You could try methods like DENTIST to preprocess the GWAS summary statistics and filter out problematic variants, or run some diagnostic in susie_rss: https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html.

We are working on adding some functions to perform the LD mismatch diagnostic based on susie_rss, which hopefully will help address this issue.

1667857557 commented 9 months ago

Thanks for your reply, we are looking forward to your good news!

It would be helpful to check potential LD mismatch issue. You could try methods like DENTIST to preprocess the GWAS summary statistics and filter out problematic variants, or run some diagnostic in susie_rss: https://stephenslab.github.io/susieR/articles/susierss_diagnostic.html.

We are working on adding some functions to perform the LD mismatch diagnostic based on susie_rss, which hopefully will help address this issue.