stephenslab / mashr

An R package for multivariate adaptive shrinkage.
https://stephenslab.github.io/mashr
Other
88 stars 19 forks source link

error: inv_sympd(): matrix is singular or not positive definite #79

Closed Young-Sook closed 4 years ago

Young-Sook commented 4 years ago

I ran mashr to my data and I am having the same error regardless of types of covariance matrix (whether I use only canonical or only data-driven). My data has 43,416 regions (rows) and four cells (4 columns).

================================================================== Here is the error message. m = mash(data, U.c)

error: chol(): decomposition failed

error: chol(): decomposition failed

error: chol(): decomposition failed

error: chol(): decomposition failed

error: chol(): decomposition failed

error: chol(): decomposition failed ....

error: chol(): decomposition failed

error: inv_sympd(): matrix is singular or not positive definite Error in calc_post_rcpp(t(data$Bhat), t(data$Shat), t(data$Shat_alpha), : inv_sympd(): matrix is singular or not positive definite In addition: Warning message: In calc_lik_matrix(data, Ulist, log = TRUE, algorithm.version = algorithm.version) : Some mixture components result in non-finite likelihoods, either due to numerical underflow/overflow, or due to invalid covariance matrices 1, 3, 4, 6, 12, 13, 15, 21, 22, 24, 30, 31, 33, 39, 40, 42, 48, 49, 51, 57, 58, 60, 66, 67, 69, 75, 76, 78, 84, 85, 87, 93, 94, 96, 102, 103, 105, 111, 112, 114, 120, 121, 123, 129, 130, 132, 138, 139, 141, 147, 148, 150, 156, 157, 159, 165, 166, 168, 174, 175, 177, 183, 184, 186, 192, 193, 195, 201, 202, 204, 210, 211, 213, 219, 220, 222, 228, 229, 231, 237, 238, 240, 246, 247, 249, 255, 256, 258, 264, 265, 267, 273, 274, 276, 282, 283, 285, 291, 292, 294, 300, 301, 303, 309, 310, 312, 318, 319, 321, 327, 328, 330, 336, 337, 339, 345, 346, 348

==================================================================

Below is my code.

data = mash_set_data(data_mean, data_std) m.1by1 = mash_1by1(data) strong = get_significant_results(m.1by1) U.pca = cov_pca(data, cellNum, subset=strong) U.ed = cov_ed(data, U.pca, subset=strong) U.c = cov_canonical(data) m = mash(data, c(U.c,U.ed))

gaow commented 4 years ago

@Young-Sook as a test, what if you follow this page:

https://stephenslab.github.io/mashr/articles/eQTL_outline.html

to use a smaller random set for mash fit and see if the problem persists?

Also unrelated but possibly important, it is recommended to account for correlation structure using some estimates as also detailed on the page above. It would be useful to see what happens if you follow strictly the instructions with a smaller data-set.

Young-Sook commented 4 years ago

Thank you so much for your quick response. I didn't have any error when I followed the script in the page https://stephenslab.github.io/mashr/articles/eQTL_outline.html.

Even after randomly subsetting the data, I still have the same error message. Below is my code.

================================================================== m.1by1 = mash_1by1(mash_set_data(data_mean, data_std)) strong.subset = get_significant_results(m.1by1, 0.0000000000000001). random.subset = sample(1:nrow(data_mean),5000)

data.temp = mash_set_data(data_mean[random.subset,], data_std[random.subset,]) Vhat = estimate_null_correlation_simple(data.temp) rm(data.temp)

data.random = mash_set_data(data_mean[random.subset,], data_std[random.subset,],V=Vhat) data.strong = mash_set_data(data_mean[strong.subset,], data_std[strong.subset,], V=Vhat)

U.pca = cov_pca(data.strong, cellNum) U.ed = cov_ed(data.strong, U.pca) U.c = cov_canonical(data.random) m = mash(data.random, Ulist = c(U.ed,U.c), outputlevel = 1) m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE)

================================================================

I do not have problem until I executed the last line. There are 9195 lines in 'data.strong'. When I executed the last line, I have the following error message.

error: chol(): decomposition failed

error: chol(): decomposition failed

error: chol(): decomposition failed

... error: chol(): decomposition failed

error: chol(): decomposition failed

error: inv_sympd(): matrix is singular or not positive definite Error in calc_post_rcpp(t(data$Bhat), t(data$Shat), t(data$Shat_alpha), : inv_sympd(): matrix is singular or not positive definite In addition: Warning message: In calc_lik_matrix(data, Ulist, log = TRUE, algorithm.version = algorithm.version) : Some mixture components result in non-finite likelihoods, either due to numerical underflow/overflow, or due to invalid covariance matrices 1, 8, 9, 11, 22, 23, 25, 36, 37, 39, 50, 51, 53, 64, 65, 67, 78, 79, 81, 92, 93, 95, 106, 107, 109, 120, 121, 123, 134, 135, 137, 148, 149, 151, 162, 163, 165, 176, 177, 179, 190, 191, 193, 204, 205, 207, 218, 219, 221, 232, 233, 235, 246, 247, 249, 260, 261, 263, 274, 275, 277, 288, 289, 291, 302, 303, 305, 316, 317, 319, 330, 331, 333, 344, 345, 347, 358, 359, 361, 372, 373, 375, 386, 387, 389, 400, 401, 403, 414, 415, 417, 428, 429, 431, 442, 443, 445, 456, 457, 459, 470, 471, 473, 484, 485, 487, 498, 499, 501, 512, 513, 515

Timing stopped at: 0.216 0 0.217

zouyuxin commented 4 years ago

Could you check the values for data_std[strong.subset,]? What's the minimum value in the standard error matrix?

Young-Sook commented 4 years ago

It is 0. Would that be a problem? If so, what would you recommend to do?

gaow commented 4 years ago

@Young-Sook We should have a check in mash_set_data for zero elements in standard errors (see this line). It seems you are on an older version of mashr. Could you update to current master and try again to see if the error is captured?

If so, what would you recommend to do?

It depends on how that happens in the first place. Could you check how your summary statistics is generated and what's going on with that particular variant? What's the corresponding effect size estimate? It is best to solve the problem first in your summary statistics before feeding into mash_set_data, although in mash_set_data we do have options called zero_Bhat_reset and zero_Shat_reset. For example you can bypass the error by setting zero_Shat_reset = 1E3 so zero standard errors will be replaced with a large number (to indicate a non-significant effect size).

zouyuxin commented 4 years ago

yes, this causes the error. This makes the covariance matrix singular. In mash_set_data, we checked for exact 0. I think in your case, it is a very small number but not exact 0 (==0 gives FALSE).

gaow commented 4 years ago

Thanks @Young-Sook what you have seems to be a version we released last March. Installing with

devtools::install_github("stephenslab/mashr")

should give the master branch. But we (@pcarbo ) are now working on getting the updated package to cran. We should have an updated very soon if you want to check back later. For now, you may want to manually fix your input data anyways just to make sure you are fully aware of potential issues in your input data.

I checked ==0 command, and it gave me 'TRUE' which means the standard error value is exactly zero.

Are the corresponding data.mean also zeros? If so, you can set your data.st_err to a large number, say 1E3 to get the computation going. Otherwise you might look into issues with your summary statistics data. The check for zero in our master code was motivated by data generated from some software that produces exactly zero for both bhat and sbhat.

@zouyuxin Still based on our conversation offline I think we should change mash_set_data function to check for tiny standard errors not just 0 -- it seems you do have a data application that complains the same error, due to tiny sbhat. What do you think a tiny threshold should be in this case? It seems if we check and consider sbhat smaller than .Machine$double.eps as zeros is safe enough? In terms of chol behavior,

> chol(matrix(c(1,0,0,.Machine$double.eps^20),2,2))
     [,1]          [,2]
[1,]    1  0.000000e+00
[2,]    0 2.913414e-157
> chol(matrix(c(1,0,0,.Machine$double.eps^21),2,2))
Error in chol.default(matrix(c(1, 0, 0, .Machine$double.eps^21), 2, 2)) : 
  the leading minor of order 2 is not positive definite

Is it good enough to add a parameter like zero_tol to mash_set_data and set it default to .Machine$double.eps -- will that solve the error you once ran into? Or what else would you suggest based on your experience? (@zouyuxin )

zouyuxin commented 4 years ago

@gaow Adding a zero_tol parameter would definitely help. .Machine$double.eps is enough for usual mash analysis, but it's not enough for mashcommonbaseline. In mashcommonbaseline, the residual covariance is L S_j V S_j L^T, L is R-1 by R matrix. For example,

> S2 = diag(c(1,.Machine$double.eps^2, .Machine$double.eps^2))
> L = rbind(c(1,-1,0), c(1,0,-1))
> L %*% S2 %*% t(L)
     [,1] [,2]
[1,]    1    1
[2,]    1    1

which is singular. 1+.Machine$double.eps^2 == 1 gives TRUE.

pcarbo commented 4 years ago

I don't have the full picture here, but you could somehow avoid directly forming S2 and L %*% S2 %*% t(L) that would help the situation somewhat --- I mean, you could deal with matrices that have eigenvalues closer to zero.

zouyuxin commented 4 years ago

@pcarbo https://github.com/stephenslab/mashr/blob/master/R/set_data.R#L264 this is how I compute it right now. Are you suggesting compute this using another way?

pcarbo commented 4 years ago

No, there isn't much you can do about how you can compute Sigma. But perhaps you can somehow rework the computation to avoid a Cholesky factorization of Sigma. Or if that is unavoidable, then the best you can do is provide an informative error message. There is the more general question about how to handle poorly scaled data to avoid numerical issues — I don't think this is something we have looked at carefully.

gaow commented 4 years ago

For the time being I added a check to call values zero if they are smaller than or equal to .Machine$double.eps (2.2E-16). It can be changed using zero_check_tol. I think the problem with using projection matrix is a separate issue that is best resolved in mash common baseline implementation as @pcarbo also implies above. Let's close this ticket for now and use other tickets to discuss when we have a motivating data-set.