stephenslab / susieR

R package for "sum of single effects" regression.
https://stephenslab.github.io/susieR
Other
169 stars 42 forks source link

Correlation matrix is not positive semi-definite #190

Open karenfeng opened 1 year ago

karenfeng commented 1 year ago

I ran into an interesting corner case where our kriging_rss plot looks invalid. Based on a spot check of the variants that looked like outliers, they are all examples of multiallelic variants split into biallelic variants. I believe that this splitting caused non-independent correlations at the same position, and therefore the LD matrix is no longer positive semi-definite (which is an issue for SuSiE) . Do you have recommendations for how to deal with this surprisingly common case? 7cf9d275-5c82-40b4-9aba-74121f669f43

pcarbo commented 1 year ago

@karenfeng It is possible there are some issues that could arise with fine-mapping multiallelic SNPs, particularly if they are split into multiple SNPs. However, susie shouldn't have any problem dealing with a non-positive-semidefinite LD matrix.

karenfeng commented 1 year ago

Thanks @pcarbo! Before dropping the multiallelic sites, I sometimes get the following error with estimate_residual_variance = TRUE:

Error in susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) *  : 
  Estimating residual variance failed: the estimated value is negative

When I set estimate_residual_variance = FALSE, I get:

Warning message:
In susie_suff_stat(XtX = XtX, Xty = Xty, n = n, yty = (n - 1) *  :
  IBSS algorithm did not converge in 100 iterations!
                  Please check consistency between summary statistics and LD matrix.

However, these errors can go away when I drop the split multiallelic sites from bhat, shat, and R. Do you have an instinct as to why this would be happening?

stephens999 commented 1 year ago

the kriging plot suggests that the LD and the z scores are inconsistent with one another, which suggests there is something inconsistent in your pipeline for dealing with the multi-allelic loci. Eg maybe you are not doing exactly the same thing for computing the bhat and shat as you are for computing R?