Open karenfeng opened 1 year ago
@karenfeng It is possible there are some issues that could arise with fine-mapping multiallelic SNPs, particularly if they are split into multiple SNPs. However, susie shouldn't have any problem dealing with a non-positive-semidefinite LD matrix.
Thanks @pcarbo! Before dropping the multiallelic sites, I sometimes get the following error with estimate_residual_variance = TRUE
:
Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) * :
Estimating residual variance failed: the estimated value is negative
When I set estimate_residual_variance = FALSE
, I get:
Warning message:
In susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) * :
IBSS algorithm did not converge in 100 iterations!
Please check consistency between summary statistics and LD matrix.
However, these errors can go away when I drop the split multiallelic sites from bhat, shat, and R. Do you have an instinct as to why this would be happening?
the kriging plot suggests that the LD and the z scores are inconsistent with one another, which suggests there is something inconsistent in your pipeline for dealing with the multi-allelic loci. Eg maybe you are not doing exactly the same thing for computing the bhat and shat as you are for computing R?
I ran into an interesting corner case where our![7cf9d275-5c82-40b4-9aba-74121f669f43](https://github.com/stephenslab/susieR/assets/4754931/9192b2fb-1530-4972-b323-3efdc1099072)
kriging_rss
plot looks invalid. Based on a spot check of the variants that looked like outliers, they are all examples of multiallelic variants split into biallelic variants. I believe that this splitting caused non-independent correlations at the same position, and therefore the LD matrix is no longer positive semi-definite (which is an issue for SuSiE) . Do you have recommendations for how to deal with this surprisingly common case?