hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
119 stars 55 forks source link

'x' must be an array of at least two dimensions #53

Closed kieranrcampbell closed 5 years ago

kieranrcampbell commented 6 years ago

Hi Vlad,

Hope you're doing well.

I'm running SC3 on a dataset (first time using SingleCellExperiments with it) and get the following rather opaque error:

sce_cnv_no_X_use <- sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2, gene_filter = FALSE)
> Setting SC3 parameters...
> Error in rowSums(dataset == 0) : 
 > 'x' must be an array of at least two dimensions

any idea what might be causing this? The traceback looks like

> traceback()
6: stop("'x' must be an array of at least two dimensions")
5: rowSums(dataset == 0)
4: sc3_prepare(object, gene_filter, pct_dropout_min, pct_dropout_max, 
       d_region_min, d_region_max, svm_num_cells, svm_train_inds, 
       svm_max, n_cores, kmeans_nstart, kmeans_iter_max, rand_seed)
3: sc3_prepare(object, gene_filter, pct_dropout_min, pct_dropout_max, 
       d_region_min, d_region_max, svm_num_cells, svm_train_inds, 
       svm_max, n_cores, kmeans_nstart, kmeans_iter_max, rand_seed)
2: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2, 
       gene_filter = FALSE)
1: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2, 
       gene_filter = FALSE)

and my sessioninfo looks like

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SC3_1.7.2                  BiocInstaller_1.28.0       bindrcpp_0.2              
 [4] clonealign_0.99.0          goseq_1.30.0               geneLenDataBase_1.14.0    
 [7] BiasedUrn_1.07             ggforce_0.1.1              ggbeeswarm_0.6.0          
[10] ggmcmc_1.1                 edgeR_3.20.1               limma_3.34.0              
[13] ggrepel_0.7.0              cowplot_0.9.2              glue_1.2.0                
[16] tidyr_0.7.2                dplyr_0.7.4                readr_1.1.1               
[19] scran_1.6.2                BiocParallel_1.12.0        scater_1.6.0              
[22] SingleCellExperiment_1.0.0 SummarizedExperiment_1.8.1 DelayedArray_0.4.1        
[25] matrixStats_0.52.2         GenomicRanges_1.30.1       GenomeInfoDb_1.14.0       
[28] IRanges_2.12.0             S4Vectors_0.16.0           ggplot2_2.2.1             
[31] Biobase_2.38.0             BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
  [1] plyr_1.8.4               igraph_1.1.2             lazyeval_0.2.1          
  [4] shinydashboard_0.6.1     splines_3.4.2            digest_0.6.13           
  [7] foreach_1.4.4            htmltools_0.3.6          viridis_0.4.0           
 [10] GO.db_3.5.0              gdata_2.18.0             magrittr_1.5            
 [13] memoise_1.1.0            cluster_2.0.6            doParallel_1.0.11       
 [16] ROCR_1.0-7               Biostrings_2.46.0        prettyunits_1.0.2       
 [19] colorspace_1.3-2         rrcov_1.4-3              blob_1.1.0              
 [22] WriteXLS_4.0.0           crayon_1.3.4             RCurl_1.95-4.8          
 [25] tximport_1.6.0           roxygen2_6.0.1           bindr_0.1               
 [28] zoo_1.8-0                iterators_1.0.9          registry_0.5            
 [31] gtable_0.2.0             zlibbioc_1.24.0          XVector_0.18.0          
 [34] DEoptimR_1.0-8           scales_0.5.0.9000        mvtnorm_1.0-6           
 [37] pheatmap_1.0.8           rngtools_1.2.4           DBI_0.7                 
 [40] GGally_1.3.2             Rcpp_0.12.14             viridisLite_0.2.0       
 [43] xtable_1.8-2             progress_1.1.2           units_0.4-6             
 [46] bit_1.1-12               DT_0.2                   htmlwidgets_0.9         
 [49] httr_1.3.1               FNN_1.1                  gplots_3.0.1            
 [52] RColorBrewer_1.1-2       pkgconfig_2.0.1          reshape_0.8.7           
 [55] XML_3.98-1.9             locfit_1.5-9.1           dynamicTreeCut_1.63-1   
 [58] tidyselect_0.2.3         labeling_0.3             rlang_0.1.4             
 [61] reshape2_1.4.2           AnnotationDbi_1.40.0     munsell_0.4.3           
 [64] tools_3.4.2              RSQLite_2.0              devtools_1.13.4         
 [67] stringr_1.2.0            yaml_2.1.16              knitr_1.17              
 [70] bit64_0.9-7              robustbase_0.92-8        caTools_1.17.1          
 [73] purrr_0.2.4              doRNG_1.6.6              nlme_3.1-131            
 [76] mime_0.5                 xml2_1.1.1               biomaRt_2.34.1          
 [79] compiler_3.4.2           rstudioapi_0.7           curl_3.0                
 [82] beeswarm_0.2.3           e1071_1.6-8              testthat_2.0.0          
 [85] tibble_1.3.4             statmod_1.4.30           tweenr_0.1.5            
 [88] pcaPP_1.9-72             stringi_1.1.5            GenomicFeatures_1.30.0  
 [91] lattice_0.20-35          Matrix_1.2-11            commonmark_1.4          
 [94] data.table_1.10.4-3      bitops_1.0-6             httpuv_1.3.5            
 [97] rtracklayer_1.38.0       R6_2.2.2                 RMySQL_0.10.13          
[100] KernSmooth_2.23-15       gridExtra_2.3            vipor_0.4.5             
[103] codetools_0.2-15         MASS_7.3-47              gtools_3.5.0            
[106] assertthat_0.2.0         rhdf5_2.22.0             pkgmaker_0.22           
[109] rjson_0.2.15             withr_2.1.1.9000         GenomicAlignments_1.14.0
[112] Rsamtools_1.30.0         GenomeInfoDbData_0.99.1  mgcv_1.8-20             
[115] hms_0.3                  udunits2_0.13            grid_3.4.2              
[118] class_7.3-14             git2r_0.19.0             shiny_1.0.5    

Many thanks,

Kieran

wikiselev commented 6 years ago

Hi Kieran,

Thanks for your message, it looks like it was a bug, I am surprised no one has caught it before! I think I fixed it with my latest commit: https://github.com/hemberg-lab/SC3/commit/973357d46a2578ccd1984c8ca8136bb9c6077ddb

Could you please reinstall from GitHub and check it again?

Cheers, Vlad

kieranrcampbell commented 6 years ago

Thanks for the quick response. After installing gfortran, sc3 now runs but subsequently gets the error

Error in ED2(data) : 
  Not compatible with requested type: [type=S4; target=double].
Error in cor(data, method = "pearson") : 
  supply both 'x' and 'y' or a matrix-like 'x'
In addition: Warning messages:
1: package ‘foreach’ was built under R version 3.4.3 
2: package ‘registry’ was built under R version 3.4.3 
In addition: Warning messages:
1: package ‘foreach’ was built under R version 3.4.3 
2: package ‘registry’ was built under R version 3.4.3 
Error in cor(data, method = "spearman") : 
  supply both 'x' and 'y' or a matrix-like 'x'
Error in checkForRemoteErrors(val) : 
  3 nodes produced errors; first error: Error in ED2(data) : 
  Not compatible with requested type: [type=S4; target=double].

with traceback

> traceback()
12: stop(count, " nodes produced errors; first error: ", firstmsg, 
        domain = NA)
11: checkForRemoteErrors(val)
10: dynamicClusterApply(cl, fun, length(x), argfun)
9: clusterApplyLB(cl, argsList, evalWrapper)
8: e$fun(obj, substitute(ex), parent.frame(), e$data)
7: list(args = distances(.doRNG.stream = list(c(407L, 460285142L, 
   86547807L, -823994348L, 146017285L, 684646658L, 270092443L), 
       c(407L, -510730265L, -1804156173L, 1706273257L, 546265011L, 
       -1997178580L, 1192571589L), c(407L, -1854877373L, 209468496L, 
       782277495L, -63406886L, -1842168843L, 584993947L))), argnames = c("i", 
   ".doRNG.stream"), evalenv = <environment>, specified = character(0), 
       combineInfo = list(fun = function (a, ...) 
       c(a, list(...)), in.order = TRUE, has.init = TRUE, init = list(), 
           final = NULL, multi.combine = TRUE, max.combine = 100), 
       errorHandling = "stop", packages = "doRNG", export = NULL, 
       noexport = NULL, options = list(), verbose = FALSE) %dopar% 
       {
           {
               rngtools::RNGseed(.doRNG.stream)
           }
           {
               try({
                   calculate_distance(dataset, i)
               })
           }
       }
6: do.call("%dopar%", list(obj, ex), envir = parent.frame())
5: foreach::foreach(i = distances) %dorng% {
       try({
           calculate_distance(dataset, i)
       })
   }
4: sc3_calc_dists(object)
3: sc3_calc_dists(object)
2: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2, 
       gene_filter = FALSE)
1: sc3(sce_cnv_no_X_use, ks = 2:3, biology = TRUE, n_cores = 2, 
       gene_filter = FALSE)
wikiselev commented 6 years ago

Can you share you data with me on vk6@sanger.ac.uk?

kieranrcampbell commented 6 years ago

But if I email it to you then people might start to question if we actually work on "big data" ;)

On its way - many thanks!

wikiselev commented 6 years ago

Hi Kieran, looks like the problem is that you have sparse matrices of class dgCMatrix in all your slots. SC3 does not know how to deal with them, because I did not know that you can store sparse matrices in the slots of SingleCellExperiment. I did this:

counts(sce) <- as.matrix(counts(sce))
normcounts(sce) <- as.matrix(normcounts(sce))
logcounts(sce) <- as.matrix(logcounts(sce))

and everything worked fine. However, I would like to put this inside the SC3 functions, so that it can deal with it with no errors and allows to reduce file sizes. Is dgCMatrix the only sparse format that can be used in the slots of SingleCellExperiment?

wikiselev commented 6 years ago

And is as.matrix a right way to convert it to a full matrix?

kieranrcampbell commented 6 years ago

Ah, interesting catch!

I think the reason it's stored as a sparse matrix is this is 10x data; the doc for read10xResults says

counts data stored as a sparse matrix

as.matrix seems to work fine, but as for whether dgCMatrix is the only class of sparse matrix used, I'm not sure. Best to ask Aaron?

wikiselev commented 6 years ago

Thank, Kieran! @LTLA, could you please comment on what is the best/efficient/economic format to store data in SingleCellExperiment slots? And what are the all possible options? Many thanks in advance.

LTLA commented 6 years ago

I think that any matrix-like object can be stored in the assay slot of a SummarizedExperiment object, i.e., the object supports row/column subsetting, nrow/ncol queries, r/cbind, etc. You can have a normal matrix, a sparse matrix of various types (e.g., dgCMatrix, dgTMatrix, or the mythical dgRMatrix), file-backed arrays like big.matrix and HDF5Matrix, and so on. These are all subject to an access speed/memory usage trade-off, see the beachmat paper for a discussion of this.

In the case of read10xResults, only a dgCMatrix will ever be returned. This is the most common format for sparse matrices and is the recommended format for use within Matrix, due to the fact that it provides fast column access and tolerable row access.

wikiselev commented 6 years ago

thanks a lot, @LTLA !

mt1022 commented 6 years ago

The rowSums (and similar functions) in package Matrix works for sparse Matrix. Is it possible to check the class of the count matrix and determine to use the normal rowSums or Matrix::rowSums based on that?

LTLA commented 6 years ago

This is not necessary if you have Matrix::rowSums, which works fine with ordinary matrices:

a <- matrix(runif(20), 5, 4)
Matrix::rowSums(a)

This would ideally be the default behaviour without having to explicitly import Matrix in our packages, see https://www.mail-archive.com/bioc-devel@r-project.org/msg08423.html for a discussion.

zhangguy commented 5 years ago

Hi @wikiselev I am still getting this error. I think what @LTLA suggested make sense instead of coercing the sparse matrix into a regular one. Is it possible to incorporate? Thanks. zhangguy

wikiselev commented 5 years ago

Sorry, there is no active development of SC3 at the moment and there is no resource available for it in the near future. You are welcome to create pull requests, I can incorporate your changes to the package.