stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
169 stars 42 forks source link

Issue with output cs #192

Open Huifang-Xu opened 1 year ago

Huifang-Xu commented 1 year ago

Hi there,

I've been using susie_rss() for fine-mapping summary statistics using LD references from 1000GP. For one locus (Nsnps = 2953), I can't get a credible set from summary(susie_rss_z)$cs, but I have been able to get a credible set from susie_get_cs(susie_rss_z). The PIP plot does not show cs, but the highest PIP is around 0.8.

Here is my code,

library(susieR)
library(data.table)
library(dplyr)
library("Rfast")
set.seed(1)

ld_matrix <- fread("chr1_phase3_EUR.locus1_chr1.LDr.ld",sep=' ',header=F)
sumstat <- read.table("ndd.a1effect.munge.noMHC.locus1_chr1.LDmatch.csv",sep = '\t', header=T)

susie_rss_z <- susie_rss(sumstat$Z, R=ld_matrix, n=120713, L = 10)
summary(susie_rss_z)$cs

image

cs <- susie_get_cs(susie_rss_z)
str(cs)

image

pip <- susie_get_pip(susie_rss_z)
pip[c(1426,1431,1433,1438,2657)]

image

susie_plot(susie_rss_z,y="PIP")

image

However, if I narrow the region to 800 SNPS, I can get a credible set from summary(susie_rss_z)$cs and susie_get_cs(susie_rss_z). The highest PIP of the same SNP is similar.

sumstat_sub <- sumstat[1033:1833,]
ld_matrix_sub <- ld_matrix[1033:1833,1033:1833]
susie_rss_z_sub <- susie_rss(sumstat_sub$Z, R=ld_matrix_sub, n=sampleSize, L = 10)
summary(susie_rss_z_sub)$cs

image

cs_sub <- susie_get_cs(susie_rss_z_sub)
str(cs_sub)

image

pip_sub <- susie_get_pip(susie_rss_z_sub)
pip_sub[c(399,401,406)]

image

susie_plot(susie_rss_z_sub,y="PIP")

image

I want to know why there are different outputs. Should I always use susie_get_cs()? Because I usually use summary(susie_rss_z)$cs to extract and plot cs.

Here are the summary statistics and LD matrix. I appreciate your help. https://github.com/Huifang-Xu/test_SuSiE/blob/main/locus1.zip

Best, Huifang

pcarbo commented 1 year ago

@Huifang-Xu What is the purity of the CSs?

Huifang-Xu commented 1 year ago

This is the purity of the small region (Nsnps = 800). image

There is no output for the entire locus (Nsnps = 2953). image

pcarbo commented 1 year ago

@Huifang-Xu Try setting min_abs_corr to a smaller number, e.g., 0.

Huifang-Xu commented 1 year ago

I reran the model for the entire locus, the min.abs.corr of L1 is 0.295.

susie_rss_z_Purity <- susie_rss(sumstat$Z, R=ld_matrix, n=sampleSize, L = 10,min_abs_corr=0)
print(susie_rss_z_Purity$sets)

image

For the small region, image

Huifang-Xu commented 1 year ago

@pcarbo I wonder why the credible set of the small region has higher purity while the credible set of the large region has lower purity. Should we choose a small region for fine-mapping? e.g., 1000 SNPs or a 250Kb window.

Huifang-Xu commented 1 year ago

@pcarbo I realized a distant SNP was included in the cs for the entire locus but not in the small one. I think that's why the large region has a low purity and does not produce cs. But again, should we choose a small region for fine mapping, e.g., 1000 SNPs or a 250Kb window?

Here are the SNPs included in the cs for the large region (Nsnp=2953): image

Here are the SNPs included in the cs for the small region (Nsnp=800): image

Thanks, Huifang

pcarbo commented 1 year ago

@Huifang-Xu In general, it is safer to choose a larger fine-mapping region. The results for this particular locus do not look particularly strong; this may be a "borderline" result.

Huifang-Xu commented 1 year ago

@pcarbo I see. Thank you very much for your prompt reply and help! I really appreciate it.

Huifang

zouyuxin commented 1 year ago

Should I always use susie_get_cs()? Because I usually use summary(susie_rss_z)$cs to extract and plot cs.

For susie_get_cs, it requires X or Xcorr to output the same CSs as summary(susie_rss_z). It uses pairwise correlations to filter CSs. In the case X and Xcorr are unspecified, it outputs L CSs (L = 10 by default).

Huifang-Xu commented 1 year ago

@zouyuxin I see. Thanks for the clarification.