Closed igordot closed 5 years ago
Could you give me a traceback here? I have no idea which part of the code throws that.
Here is the more complete output.
> dm = DiffusionMap(t(exp_mat), n_eigs = 2, rotate = TRUE, verbose = TRUE, suppress_dpt = TRUE)
finding knns......done. Time: 22466.43s
Calculating transition probabilities...Error in .local(x, y, ...) : negative length vectors are not allowed
Timing stopped at: 456.3 136.2 592.5
> traceback()
15: .Call(dgeMatrix_crossprod, x, TRUE)
14: .local(x, y, ...)
13: tcrossprod(Matrix(sigma))
12: tcrossprod(Matrix(sigma))
11: .class1(object)
10: as(mat, "dsCMatrix")
9: withCallingHandlers(expr, message = function(c) invokeRestart("muffleMessage"))
8: suppressMessages(as(mat, "dsCMatrix")[mask])
7: m(tcrossprod(Matrix(sigma)))
6: no_censoring(dists, sigma, cb)
5: force(expr)
4: system.time({
r <- force(expr)
})
3: verbose_timing(verbose, "Calculating transition probabilities",
{
if (censor)
censoring(imputed_data, sigma, dists, censor_val,
censor_range, missing_range, cb)
else no_censoring(dists, sigma, cb)
})
2: transition_probabilities(imputed_data, sigma, knn$dist_mat, censor,
censor_val, censor_range, missing_range, verbose)
1: DiffusionMap(t(exp_mat), n_eigs = 2, rotate = TRUE, verbose = TRUE,
suppress_dpt = TRUE)
The error is R not being able to allocate a giant vector. It’s happening here:
And 2 lines above I wrote “TODO: optimize local sigma no-censoring case”. I think the reason is simply that this is the only line where there’s still a dense matrix being created, which blows up. I guess it’s possible to write this more efficiently. The code is basically:
huge_mat <- sigma %*% t(sigma)
S1 <- huge_mat[d2 != 0 & upper.tri.sparse(d2)]
huge_mat2 <- outer(sigma^2, sigma^2, `+`)
S1 <- huge_mat2[d2 != 0 & upper.tri.sparse(d2)]
A more efficient way to do it would probably be simply
# convert to a symmetric triplet matrix to have
# all non-zero upper-triangle coordinates
# TODO: check if this leaves the coordinates in the same order!
coords <- as(d2, 'dsTMatrix')
S1 <- sigma[coords@i+1] * sigma[coords@j+1]
S2 <- (sigma^2)[coords@i+1] + (sigma^2)[coords@j+1]
Should be fixed in 1008274908db42e08e608236bab1df382795bf5e, can you check?
Looks the update solved the problem. The process completed and the results seem reasonable.
Thank you for the quick fix!
Great!
Actually, I have an update on this issue. I tested the new version on a few different data sets. Most of them come out essentially the same, but there is one data set that produced very different results.
This is the result with 2.12:
And this is the with the GitHub version:
The first one is roughly what one would expect and the second one is not. I tried subsampling the number of cells. I am showing 5000 cells above. Reducing it to 4500 yielded a plot that looks normal with both versions, so it seems like the input data is mostly okay.
Do you know what may be causing this?
I am running
DiffusionMap()
on an expression matrix of 2,000 most variable genes. It works great for a few thousand cells. However, at around 50,000 (so 50,000x2,000 input matrix), I get this error:I tried adding
suppress_dpt = TRUE
, which seemed like the only option that may help, but it did not. Is there any way around resolve that?