Closed Young-Sook closed 4 years ago
@Young-Sook as a test, what if you follow this page:
https://stephenslab.github.io/mashr/articles/eQTL_outline.html
to use a smaller random set for mash
fit and see if the problem persists?
Also unrelated but possibly important, it is recommended to account for correlation structure using some estimates as also detailed on the page above. It would be useful to see what happens if you follow strictly the instructions with a smaller data-set.
Thank you so much for your quick response. I didn't have any error when I followed the script in the page https://stephenslab.github.io/mashr/articles/eQTL_outline.html.
Even after randomly subsetting the data, I still have the same error message. Below is my code.
================================================================== m.1by1 = mash_1by1(mash_set_data(data_mean, data_std)) strong.subset = get_significant_results(m.1by1, 0.0000000000000001). random.subset = sample(1:nrow(data_mean),5000)
data.temp = mash_set_data(data_mean[random.subset,], data_std[random.subset,]) Vhat = estimate_null_correlation_simple(data.temp) rm(data.temp)
data.random = mash_set_data(data_mean[random.subset,], data_std[random.subset,],V=Vhat) data.strong = mash_set_data(data_mean[strong.subset,], data_std[strong.subset,], V=Vhat)
U.pca = cov_pca(data.strong, cellNum) U.ed = cov_ed(data.strong, U.pca) U.c = cov_canonical(data.random) m = mash(data.random, Ulist = c(U.ed,U.c), outputlevel = 1) m2 = mash(data.strong, g=get_fitted_g(m), fixg=TRUE)
================================================================
I do not have problem until I executed the last line. There are 9195 lines in 'data.strong'. When I executed the last line, I have the following error message.
error: chol(): decomposition failed
error: chol(): decomposition failed
error: chol(): decomposition failed
... error: chol(): decomposition failed
error: chol(): decomposition failed
error: inv_sympd(): matrix is singular or not positive definite Error in calc_post_rcpp(t(data$Bhat), t(data$Shat), t(data$Shat_alpha), : inv_sympd(): matrix is singular or not positive definite In addition: Warning message: In calc_lik_matrix(data, Ulist, log = TRUE, algorithm.version = algorithm.version) : Some mixture components result in non-finite likelihoods, either due to numerical underflow/overflow, or due to invalid covariance matrices 1, 8, 9, 11, 22, 23, 25, 36, 37, 39, 50, 51, 53, 64, 65, 67, 78, 79, 81, 92, 93, 95, 106, 107, 109, 120, 121, 123, 134, 135, 137, 148, 149, 151, 162, 163, 165, 176, 177, 179, 190, 191, 193, 204, 205, 207, 218, 219, 221, 232, 233, 235, 246, 247, 249, 260, 261, 263, 274, 275, 277, 288, 289, 291, 302, 303, 305, 316, 317, 319, 330, 331, 333, 344, 345, 347, 358, 359, 361, 372, 373, 375, 386, 387, 389, 400, 401, 403, 414, 415, 417, 428, 429, 431, 442, 443, 445, 456, 457, 459, 470, 471, 473, 484, 485, 487, 498, 499, 501, 512, 513, 515
Timing stopped at: 0.216 0 0.217
Could you check the values for data_std[strong.subset,]? What's the minimum value in the standard error matrix?
It is 0. Would that be a problem? If so, what would you recommend to do?
@Young-Sook We should have a check in mash_set_data
for zero elements in standard errors (see this line). It seems you are on an older version of mashr
. Could you update to current master
and try again to see if the error is captured?
If so, what would you recommend to do?
It depends on how that happens in the first place. Could you check how your summary statistics is generated and what's going on with that particular variant? What's the corresponding effect size estimate? It is best to solve the problem first in your summary statistics before feeding into mash_set_data
, although in mash_set_data
we do have options called zero_Bhat_reset
and zero_Shat_reset
. For example you can bypass the error by setting zero_Shat_reset = 1E3
so zero standard errors will be replaced with a large number (to indicate a non-significant effect size).
yes, this causes the error. This makes the covariance matrix singular. In mash_set_data, we checked for exact 0. I think in your case, it is a very small number but not exact 0 (==0 gives FALSE).
Thanks @Young-Sook what you have seems to be a version we released last March. Installing with
devtools::install_github("stephenslab/mashr")
should give the master
branch. But we (@pcarbo ) are now working on getting the updated package to cran. We should have an updated very soon if you want to check back later. For now, you may want to manually fix your input data anyways just to make sure you are fully aware of potential issues in your input data.
I checked ==0 command, and it gave me 'TRUE' which means the standard error value is exactly zero.
Are the corresponding data.mean
also zeros? If so, you can set your data.st_err
to a large number, say 1E3 to get the computation going. Otherwise you might look into issues with your summary statistics data. The check for zero in our master
code was motivated by data generated from some software that produces exactly zero for both bhat and sbhat.
@zouyuxin Still based on our conversation offline I think we should change mash_set_data
function to check for tiny standard errors not just 0 -- it seems you do have a data application that complains the same error, due to tiny sbhat. What do you think a tiny threshold should be in this case? It seems if we check and consider sbhat smaller than .Machine$double.eps
as zeros is safe enough? In terms of chol
behavior,
> chol(matrix(c(1,0,0,.Machine$double.eps^20),2,2))
[,1] [,2]
[1,] 1 0.000000e+00
[2,] 0 2.913414e-157
> chol(matrix(c(1,0,0,.Machine$double.eps^21),2,2))
Error in chol.default(matrix(c(1, 0, 0, .Machine$double.eps^21), 2, 2)) :
the leading minor of order 2 is not positive definite
Is it good enough to add a parameter like zero_tol
to mash_set_data
and set it default to .Machine$double.eps
-- will that solve the error you once ran into? Or what else would you suggest based on your experience? (@zouyuxin )
@gaow Adding a zero_tol parameter would definitely help. .Machine$double.eps
is enough for usual mash analysis, but it's not enough for mashcommonbaseline. In mashcommonbaseline, the residual covariance is L S_j V S_j L^T, L is R-1 by R matrix. For example,
> S2 = diag(c(1,.Machine$double.eps^2, .Machine$double.eps^2))
> L = rbind(c(1,-1,0), c(1,0,-1))
> L %*% S2 %*% t(L)
[,1] [,2]
[1,] 1 1
[2,] 1 1
which is singular. 1+.Machine$double.eps^2 == 1 gives TRUE.
I don't have the full picture here, but you could somehow avoid directly forming S2
and L %*% S2 %*% t(L)
that would help the situation somewhat --- I mean, you could deal with matrices that have eigenvalues closer to zero.
@pcarbo https://github.com/stephenslab/mashr/blob/master/R/set_data.R#L264 this is how I compute it right now. Are you suggesting compute this using another way?
No, there isn't much you can do about how you can compute Sigma. But perhaps you can somehow rework the computation to avoid a Cholesky factorization of Sigma. Or if that is unavoidable, then the best you can do is provide an informative error message. There is the more general question about how to handle poorly scaled data to avoid numerical issues — I don't think this is something we have looked at carefully.
For the time being I added a check to call values zero if they are smaller than or equal to .Machine$double.eps (2.2E-16). It can be changed using zero_check_tol
. I think the problem with using projection matrix is a separate issue that is best resolved in mash common baseline implementation as @pcarbo also implies above. Let's close this ticket for now and use other tickets to discuss when we have a motivating data-set.
I ran mashr to my data and I am having the same error regardless of types of covariance matrix (whether I use only canonical or only data-driven). My data has 43,416 regions (rows) and four cells (4 columns).
================================================================== Here is the error message. m = mash(data, U.c)
error: chol(): decomposition failed
error: chol(): decomposition failed
error: chol(): decomposition failed
error: chol(): decomposition failed
error: chol(): decomposition failed
error: chol(): decomposition failed ....
error: chol(): decomposition failed
error: inv_sympd(): matrix is singular or not positive definite Error in calc_post_rcpp(t(data$Bhat), t(data$Shat), t(data$Shat_alpha), : inv_sympd(): matrix is singular or not positive definite In addition: Warning message: In calc_lik_matrix(data, Ulist, log = TRUE, algorithm.version = algorithm.version) : Some mixture components result in non-finite likelihoods, either due to numerical underflow/overflow, or due to invalid covariance matrices 1, 3, 4, 6, 12, 13, 15, 21, 22, 24, 30, 31, 33, 39, 40, 42, 48, 49, 51, 57, 58, 60, 66, 67, 69, 75, 76, 78, 84, 85, 87, 93, 94, 96, 102, 103, 105, 111, 112, 114, 120, 121, 123, 129, 130, 132, 138, 139, 141, 147, 148, 150, 156, 157, 159, 165, 166, 168, 174, 175, 177, 183, 184, 186, 192, 193, 195, 201, 202, 204, 210, 211, 213, 219, 220, 222, 228, 229, 231, 237, 238, 240, 246, 247, 249, 255, 256, 258, 264, 265, 267, 273, 274, 276, 282, 283, 285, 291, 292, 294, 300, 301, 303, 309, 310, 312, 318, 319, 321, 327, 328, 330, 336, 337, 339, 345, 346, 348
==================================================================
Below is my code.
data = mash_set_data(data_mean, data_std) m.1by1 = mash_1by1(data) strong = get_significant_results(m.1by1) U.pca = cov_pca(data, cellNum, subset=strong) U.ed = cov_ed(data, U.pca, subset=strong) U.c = cov_canonical(data) m = mash(data, c(U.c,U.ed))