Closed kieranrcampbell closed 6 years ago
Hi Kieran,
Thanks for your message, it looks like it was a bug, I am surprised no one has caught it before! I think I fixed it with my latest commit: https://github.com/hemberg-lab/SC3/commit/973357d46a2578ccd1984c8ca8136bb9c6077ddb
Could you please reinstall from GitHub and check it again?
Cheers, Vlad
Thanks for the quick response. After installing gfortran, sc3 now runs but subsequently gets the error
Error in ED2(data) :
Not compatible with requested type: [type=S4; target=double].
Error in cor(data, method = "pearson") :
supply both 'x' and 'y' or a matrix-like 'x'
In addition: Warning messages:
1: package ‘foreach’ was built under R version 3.4.3
2: package ‘registry’ was built under R version 3.4.3
In addition: Warning messages:
1: package ‘foreach’ was built under R version 3.4.3
2: package ‘registry’ was built under R version 3.4.3
Error in cor(data, method = "spearman") :
supply both 'x' and 'y' or a matrix-like 'x'
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: Error in ED2(data) :
Not compatible with requested type: [type=S4; target=double].
with traceback
> traceback()
12: stop(count, " nodes produced errors; first error: ", firstmsg,
domain = NA)
11: checkForRemoteErrors(val)
10: dynamicClusterApply(cl, fun, length(x), argfun)
9: clusterApplyLB(cl, argsList, evalWrapper)
8: e$fun(obj, substitute(ex), parent.frame(), e$data)
7: list(args = distances(.doRNG.stream = list(c(407L, 460285142L,
86547807L, -823994348L, 146017285L, 684646658L, 270092443L),
c(407L, -510730265L, -1804156173L, 1706273257L, 546265011L,
-1997178580L, 1192571589L), c(407L, -1854877373L, 209468496L,
782277495L, -63406886L, -1842168843L, 584993947L))), argnames = c("i",
".doRNG.stream"), evalenv = <environment>, specified = character(0),
combineInfo = list(fun = function (a, ...)
c(a, list(...)), in.order = TRUE, has.init = TRUE, init = list(),
final = NULL, multi.combine = TRUE, max.combine = 100),
errorHandling = "stop", packages = "doRNG", export = NULL,
noexport = NULL, options = list(), verbose = FALSE) %dopar%
{
{
rngtools::RNGseed(.doRNG.stream)
}
{
try({
calculate_distance(dataset, i)
})
}
}
6: do.call("%dopar%", list(obj, ex), envir = parent.frame())
5: foreach::foreach(i = distances) %dorng% {
try({
calculate_distance(dataset, i)
})
}
4: sc3_calc_dists(object)
3: sc3_calc_dists(object)
2: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2,
gene_filter = FALSE)
1: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2,
gene_filter = FALSE)
Can you share you data with me on vk6@sanger.ac.uk?
But if I email it to you then people might start to question if we actually work on "big data" ;)
On its way - many thanks!
Hi Kieran, looks like the problem is that you have sparse matrices of class dgCMatrix
in all your slots. SC3
does not know how to deal with them, because I did not know that you can store sparse matrices in the slots of SingleCellExperiment
. I did this:
counts(sce) <- as.matrix(counts(sce))
normcounts(sce) <- as.matrix(normcounts(sce))
logcounts(sce) <- as.matrix(logcounts(sce))
and everything worked fine. However, I would like to put this inside the SC3
functions, so that it can deal with it with no errors and allows to reduce file sizes. Is dgCMatrix
the only sparse format that can be used in the slots of SingleCellExperiment
?
And is as.matrix
a right way to convert it to a full matrix?
Ah, interesting catch!
I think the reason it's stored as a sparse matrix is this is 10x data; the doc for read10xResults
says
counts data stored as a sparse matrix
as.matrix
seems to work fine, but as for whether dgCMatrix
is the only class of sparse matrix used, I'm not sure. Best to ask Aaron?
Thank, Kieran! @LTLA, could you please comment on what is the best/efficient/economic format to store data in SingleCellExperiment
slots? And what are the all possible options? Many thanks in advance.
I think that any matrix-like object can be stored in the assay slot of a SummarizedExperiment
object, i.e., the object supports row/column subsetting, nrow/ncol
queries, r/cbind
, etc. You can have a normal matrix
, a sparse matrix of various types (e.g., dgCMatrix
, dgTMatrix
, or the mythical dgRMatrix
), file-backed arrays like big.matrix
and HDF5Matrix
, and so on. These are all subject to an access speed/memory usage trade-off, see the beachmat paper for a discussion of this.
In the case of read10xResults
, only a dgCMatrix
will ever be returned. This is the most common format for sparse matrices and is the recommended format for use within Matrix, due to the fact that it provides fast column access and tolerable row access.
thanks a lot, @LTLA !
The rowSums
(and similar functions) in package Matrix
works for sparse Matrix. Is it possible to check the class of the count matrix and determine to use the normal rowSums
or Matrix::rowSums
based on that?
This is not necessary if you have Matrix::rowSums
, which works fine with ordinary matrices:
a <- matrix(runif(20), 5, 4)
Matrix::rowSums(a)
This would ideally be the default behaviour without having to explicitly import Matrix in our packages, see https://www.mail-archive.com/bioc-devel@r-project.org/msg08423.html for a discussion.
Hi @wikiselev I am still getting this error. I think what @LTLA suggested make sense instead of coercing the sparse matrix into a regular one. Is it possible to incorporate? Thanks. zhangguy
Sorry, there is no active development of SC3
at the moment and there is no resource available for it in the near future. You are welcome to create pull requests, I can incorporate your changes to the package.
Hi Vlad,
Hope you're doing well.
I'm running SC3 on a dataset (first time using SingleCellExperiments with it) and get the following rather opaque error:
any idea what might be causing this? The traceback looks like
and my sessioninfo looks like
Many thanks,
Kieran