Open catheriz opened 1 year ago
@catheriz first of all, ideally samples with overlapping variants should not happen ... It indicates potential convergence issue. To try help with that, could you rerun with refine=T
parameter in susie_rss
?
Hi @gaow Finally ran the code suggested. I set different seed and got the similar results.
[1] "objective:-21688.1550688498" [1] "objective:-21589.9837758254" [1] "objective:-21469.7462476239" [1] "objective:-21332.9670183358" [1] "objective:-21195.7399032673" ...... [1] "objective:2615.67636145115" [1] "objective:2615.67736326038" [1] "objective:2615.67835942731"
c$sets $cs $cs$L1 [1] 49
$cs$L2 [1] 46
$cs$L4 [1] 49 58
$purity min.abs.corr mean.abs.corr median.abs.corr L1 1 1 1 L2 1 1 1 L4 1 1 1
$cs_index [1] 1 2 4
$coverage [1] 0.9978610 0.9990047 0.9999534
$requested_coverage [1] 0.95
str(c) List of 18 $ alpha : num [1:5, 1:337] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ mu : num [1:5, 1:337] -21.9 21.9 -19 -2.8 21.9 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ mu2 : num [1:5, 1:337] 481.71 478.58 362.49 7.84 480.65 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ KL : num [1:5] 13.8 13.8 13.6 11.1 13.8 $ lbf : num [1:5] 4155915 4128615 3127786 68099 4146420 $ lbf_variable : num [1:5, 1:337] 3696909 3672873 2781895 60155 3688704 ... ..- attr(, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:337] "1" "2" "3" "4" ... $ sigma2 : num 1 $ V : num [1:5] 541.52 537.97 407.56 8.87 540.29 $ pi : num [1:337] 0.00297 0.00297 0.00297 0.00297 0.00297 ... $ null_index : num 0 $ XtXr : num [1:337, 1] -663 -657 1157 -692 -533 ... $ converged : logi TRUE $ elbo : num [1:36] 2616 2616 2616 2616 2616 ... $ niter : int 36 $ X_column_scale_factors: num [1:337] 1 1 1 1 1 1 1 1 1 1 ... $ intercept : num NA $ sets :List of 5 ..$ cs :List of 3 .. ..$ L1: int 49 .. ..$ L2: int 46 .. ..$ L4: int [1:2] 49 58 ..$ purity :'data.frame': 3 obs. of 3 variables: .. ..$ min.abs.corr : num [1:3] 1 1 1 .. ..$ mean.abs.corr : num [1:3] 1 1 1 .. ..$ median.abs.corr: num [1:3] 1 1 1 ..$ cs_index : int [1:3] 1 2 4 ..$ coverage : num [1:3] 0.998 0.999 1 ..$ requested_coverage: num 0.95 $ pip : Named num [1:337] 0 0 0 0 0 0 0 0 0 0 ... ..- attr(*, "names")= chr [1:337] "1" "2" "3" "4" ...
- attr(*, "class")= chr "susie" Rplots.pdf
It looks like you have two SNPs that are perfectly correlated (SNPs 49 and 58 in L4). Are the z-scores the same for those two SNPs?
hi @pcarbo They indeed have very similar z-scores. Are they in the same set because they are correlated with each other? If so, how should I interpret the PIP? Thank you.
z[49] [1] -4.203046 z[58] [1] -4.200911
c$pip[49] 49 0.9999968 c$pip[58] 58 0.3176335
What is R[49,58]
?
R[49,58] = 1
So I think the root issue here is that the z-scores are different which shouldn't happen if the variables/SNPs are perfectly correlated.
We would really like the method to be more robust to this kind of issue, but we have not yet found a way we are fully satisfied with. It might help to add a small diagonal element to R? (ie replace R with (1-lambda)*R + lambda diag(p) for some small lambda). Certainly this example could be useful to keep in mind for us as we continue to try to improve robustness to this issue.
... but independently of the robustness issue, I think the "CS-specific PIP" the OP is asking for is given by the value of alpha in the susie fit?
Hi, I was trying to output the pip value from susie_rss result. If a variant exists in more than the credible sets, susie_get_pip(res) or res$pip cannot output the pip value specific for a variant in different sets. Example: $cs $cs$L1 [1] 49
$cs$L2 [1] 46
$cs$L4 [1] 49 58
susie_get_pip(c)[49] [1] 0.9999967
susie_get_pip(c)[46] [1] 0.999999
susie_get_pip(c)[58] [1] 0.3112985
In this simple case, I calculated the pip for variant 49 using the coverage of L4 - susie_get_pip(c)[58]. However, for sets with multiple overlap variants, how can I get the pip of those?
Thank you