stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
172 stars 43 forks source link

susieR PIP #179

Open catheriz opened 1 year ago

catheriz commented 1 year ago

Hi, I was trying to output the pip value from susie_rss result. If a variant exists in more than the credible sets, susie_get_pip(res) or res$pip cannot output the pip value specific for a variant in different sets. Example: $cs $cs$L1 [1] 49

$cs$L2 [1] 46

$cs$L4 [1] 49 58

susie_get_pip(c)[49] [1] 0.9999967

susie_get_pip(c)[46] [1] 0.999999

susie_get_pip(c)[58] [1] 0.3112985

In this simple case, I calculated the pip for variant 49 using the coverage of L4 - susie_get_pip(c)[58]. However, for sets with multiple overlap variants, how can I get the pip of those?

Thank you

gaow commented 1 year ago

@catheriz first of all, ideally samples with overlapping variants should not happen ... It indicates potential convergence issue. To try help with that, could you rerun with refine=T parameter in susie_rss?

catheriz commented 1 year ago

Hi @gaow Finally ran the code suggested. I set different seed and got the similar results.

[1] "objective:-21688.1550688498" [1] "objective:-21589.9837758254" [1] "objective:-21469.7462476239" [1] "objective:-21332.9670183358" [1] "objective:-21195.7399032673" ...... [1] "objective:2615.67636145115" [1] "objective:2615.67736326038" [1] "objective:2615.67835942731"

c$sets $cs $cs$L1 [1] 49

$cs$L2 [1] 46

$cs$L4 [1] 49 58

$purity min.abs.corr mean.abs.corr median.abs.corr L1 1 1 1 L2 1 1 1 L4 1 1 1

$cs_index [1] 1 2 4

$coverage [1] 0.9978610 0.9990047 0.9999534

$requested_coverage [1] 0.95

str(c) List of 18 $ alpha : num [1:5, 1:337] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ mu : num [1:5, 1:337] -21.9 21.9 -19 -2.8 21.9 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ mu2 : num [1:5, 1:337] 481.71 478.58 362.49 7.84 480.65 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ KL : num [1:5] 13.8 13.8 13.6 11.1 13.8 $ lbf : num [1:5] 4155915 4128615 3127786 68099 4146420 $ lbf_variable : num [1:5, 1:337] 3696909 3672873 2781895 60155 3688704 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ sigma2 : num 1 $ V : num [1:5] 541.52 537.97 407.56 8.87 540.29 $ pi : num [1:337] 0.00297 0.00297 0.00297 0.00297 0.00297 ... $ null_index : num 0 $ XtXr : num [1:337, 1] -663 -657 1157 -692 -533 ... $ converged : logi TRUE $ elbo : num [1:36] 2616 2616 2616 2616 2616 ... $ niter : int 36 $ X_column_scale_factors: num [1:337] 1 1 1 1 1 1 1 1 1 1 ... $ intercept : num NA $ sets :List of 5 ..$ cs :List of 3 .. ..$ L1: int 49 .. ..$ L2: int 46 .. ..$ L4: int [1:2] 49 58 ..$ purity :'data.frame': 3 obs. of 3 variables: .. ..$ min.abs.corr : num [1:3] 1 1 1 .. ..$ mean.abs.corr : num [1:3] 1 1 1 .. ..$ median.abs.corr: num [1:3] 1 1 1 ..$ cs_index : int [1:3] 1 2 4 ..$ coverage : num [1:3] 0.998 0.999 1 ..$ requested_coverage: num 0.95 $ pip : Named num [1:337] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(*, "names")= chr [1:337] "1" "2" "3" "4" ...

pcarbo commented 1 year ago

It looks like you have two SNPs that are perfectly correlated (SNPs 49 and 58 in L4). Are the z-scores the same for those two SNPs?

catheriz commented 1 year ago

hi @pcarbo They indeed have very similar z-scores. Are they in the same set because they are correlated with each other? If so, how should I interpret the PIP? Thank you.

z[49] [1] -4.203046 z[58] [1] -4.200911

c$pip[49] 49 0.9999968 c$pip[58] 58 0.3176335

pcarbo commented 1 year ago

What is R[49,58]?

catheriz commented 1 year ago

R[49,58] = 1

pcarbo commented 1 year ago

So I think the root issue here is that the z-scores are different which shouldn't happen if the variables/SNPs are perfectly correlated.

stephens999 commented 1 year ago

We would really like the method to be more robust to this kind of issue, but we have not yet found a way we are fully satisfied with. It might help to add a small diagonal element to R? (ie replace R with (1-lambda)*R + lambda diag(p) for some small lambda). Certainly this example could be useful to keep in mind for us as we continue to try to improve robustness to this issue.

stephens999 commented 1 year ago

... but independently of the robustness issue, I think the "CS-specific PIP" the OP is asking for is given by the value of alpha in the susie fit?