hemberg-lab / SC3

A tool for the unsupervised clustering of cells from single cell RNA-Seq experiments
http://bioconductor.org/packages/SC3
GNU General Public License v3.0
119 stars 55 forks source link

first error: Error in transf[, 1:hash.table$n_dim[i]] : incorrect number of dimensions #61

Closed Jayjay601 closed 6 years ago

Jayjay601 commented 6 years ago

I tried SC3 on a dataset and it worked without giving any error. However, when I used it on this dataset specifically, it's throwing the error. Can I know what does it mean? The only obvious difference is that this dataset has very little number of cells (<20)

sce <- sc3(sce, ks = 2:4, biology = TRUE,gene_filter=F)

Setting SC3 parameters... Calculating distances between the cells... Performing transformations and calculating eigenvectors... Performing k-means clustering... Error in checkForRemoteErrors(val) : 36 nodes produced errors; first error: Error in transf[, 1:hash.table$n_dim[i]] : incorrect number of dimensions

sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] scran_1.6.7 BiocParallel_1.12.0 scater_1.6.2 SingleCellExperiment_1.0.0 [5] SummarizedExperiment_1.8.1 DelayedArray_0.4.1 matrixStats_0.53.0 GenomicRanges_1.30.1
[9] GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 ggplot2_2.2.1
[13] tidyr_0.8.0 SC3_1.7.7 Biobase_2.38.0 BiocGenerics_0.24.0

loaded via a namespace (and not attached): [1] ggbeeswarm_0.6.0 colorspace_1.3-2 rjson_0.2.15 class_7.3-14 dynamicTreeCut_1.63-1 [6] XVector_0.18.0 DT_0.4 bit64_0.9-7 AnnotationDbi_1.40.0 mvtnorm_1.0-7
[11] codetools_0.2-15 tximport_1.6.0 doParallel_1.0.11 robustbase_0.92-8 cluster_2.0.6
[16] pheatmap_1.0.8 shinydashboard_0.6.1 shiny_1.0.5 rrcov_1.4-3 compiler_3.4.0
[21] httr_1.3.1 assertthat_0.2.0 Matrix_1.2-9 lazyeval_0.2.1 limma_3.34.6
[26] htmltools_0.3.6 prettyunits_1.0.2 tools_3.4.0 bindrcpp_0.2 igraph_1.1.2
[31] gtable_0.2.0 glue_1.2.0 GenomeInfoDbData_1.0.0 reshape2_1.4.3 dplyr_0.7.4
[36] doRNG_1.6.6 Rcpp_0.12.15 gdata_2.18.0 iterators_1.0.9 stringr_1.2.0
[41] mime_0.5 rngtools_1.2.4 gtools_3.5.0 WriteXLS_4.0.0 statmod_1.4.30
[46] XML_3.98-1.9 edgeR_3.20.7 DEoptimR_1.0-8 zlibbioc_1.24.0 zoo_1.8-1
[51] scales_0.5.0 rhdf5_2.22.0 RColorBrewer_1.1-2 memoise_1.1.0 gridExtra_2.3
[56] pkgmaker_0.22 biomaRt_2.34.2 stringi_1.1.6 RSQLite_2.0 pcaPP_1.9-73
[61] foreach_1.4.4 e1071_1.6-8 caTools_1.17.1 rlang_0.1.6 pkgconfig_2.0.1
[66] bitops_1.0-6 lattice_0.20-35 ROCR_1.0-7 purrr_0.2.4 bindr_0.1
[71] labeling_0.3 htmlwidgets_1.0 cowplot_0.9.2 bit_1.1-12 tidyselect_0.2.3
[76] plyr_1.8.4 magrittr_1.5 R6_2.2.2 gplots_3.0.1 DBI_0.7
[81] pillar_1.1.0 RCurl_1.95-4.10 tibble_1.4.2 KernSmooth_2.23-15 viridis_0.4.1
[86] progress_1.1.2 locfit_1.5-9.1 grid_3.4.0 data.table_1.10.4-3 blob_1.1.0
[91] FNN_1.1 digest_0.6.15 xtable_1.8-2 httpuv_1.3.5 munsell_0.4.3
[96] registry_0.5 beeswarm_0.2.3 viridisLite_0.3.0 vipor_0.4.5

Thanks in advance! Y

pati-ni commented 6 years ago

Hey Yvonne, can you please post the output of the > sce object?

Jayjay601 commented 6 years ago

Hi, I added n_cores=1 to the call and I got the same error. Here's the sce object >sce <- sc3(sce, ks = 2:4, biology = TRUE,gene_filter=F,n_cores=1) Setting SC3 parameters... Calculating distances between the cells... Performing transformations and calculating eigenvectors... Performing k-means clustering... Error in checkForRemoteErrors(val) : 36 nodes produced errors; first error: Error in transf[, 1:hash.table$n_dim[i]] : incorrect number of dimensions

> sce class: SingleCellExperiment dim: 15061 12 metadata(0): assays(2): counts logcounts rownames(15061): ENSG00000000003.10 ENSG00000000419.8 ... ENSGR0000185291.6 ENSGR0000197976.6 rowData names(1): feature_symbol colnames(12): Cell_B_A8 Cell_B_B11 ... Cell_B_H3 Cell_B_H7 colData names(1): sample reducedDimNames(0): spikeNames(0):

Thanks for your help! Yvonne

pati-ni commented 6 years ago

If I could take a guess there seems to be a problem with the d_region_min and d_region_max parameters because the data set is too small. These parameters have default values of 0.04 and 0.07 respectively but you can override them them by setting a custom value during the sc3 call.

By default these parameters are used to set n_dim here

So my guess is that the problem is that floor(d_region_min * cell_number) gives zero.

I would suggest setting d_region_min to anything greater that 0.09 and d_region_max to anything greater than 0.1 though it would be a good idea to try bigger values like >0.35 to get a greater range in n_dim variable.

Hope this helps

wikiselev commented 6 years ago

Yes, @pati-ni is right, why do you have just 12 cells in your experiment? Is it a bulk sample? If yes, we don't recommend running SC3 on the bulk data.

cdsoria commented 6 years ago

Hello Vladimir, When the data is not small (>5000 cells and >20,000 genes) what range should we use for the n_dim? Should I go lower than 0.04? I have the same issue as #73. I use the matrix from Seurat but I am not sure that has anything to do with that as I have done that before with no errors. Thanks in advance

wikiselev commented 6 years ago

Is it in a sparse format from Seurat? This can be an issue. @pati-ni could you please check this?

cdsoria commented 6 years ago

Just to clarify... I make a seurat object with a raw matrix (or more than one): srat = CreateSeuratObject(srat) srat_scater <- SingleCellExperiment(assays = list(counts = as.matrix(srat@raw.data)), colData = srat@meta.data) So, I do not normalise in Seurat or do anything else. I just like reading the several matrices that I have and adding metadata. Is this a problem? cheers

DrLucyMac commented 6 years ago

Hey @cds1 , Did you figure out what values to set for the d_region_min and d_region_max parameters for a dataset of more than 5000 cells? Cheers

cdsoria commented 6 years ago

@lmacdonald12 unfortunately not yet...

DrLucyMac commented 6 years ago

I just keep getting an error if my dataset has more than 5000 cells

cdsoria commented 6 years ago

Something that did work for me just now but only for 5000 cells was to add some filtering steps... Not sure if that will work for you.

1) filter_by_total_counts <- (srat_scater$total_counts < 30000) table(filter_by_total_counts)

2) filter_by_expr_features <- (srat_scater$total_features > 600) table(filter_by_expr_features)

3) srat_scater$use <- (

filter_by_expr_features &

filter_by_total_counts 

)

When I added this step, it did work albeit for only n=5000 even when the total was n=5300

4) filter_genes <- apply( counts(srat_scater[ , colData(srat_scater)$use]), 1, function(x) length(x[x > 1]) >= 2 )

5) rowData(srat_scater)$use <- filter_genes

6 ) srat_scater.qc <- srat_scater[rowData(srat_scater)$use, colData(srat_scater)$use]

...You can now calculate the log and apply SC3. Let me know if that worked for you too.

cdsoria commented 6 years ago

@lmacdonald12 also I gave it more than one K in SC3, also not sure if that made a difference srat_scater.qc <- sc3(srat_scater.qc, ks = 16:17, biology = TRUE) Another things is that it gave me some errors but it appears that it ran. So maybe try different ks ranges

DrLucyMac commented 6 years ago

@cds1 I didnt do the filtering but I just managed to get it to run with more than one K. Thank you so much!! It gave me 50 warnings but atleast its worked!

cdsoria commented 6 years ago

@lmacdonald12, great news! Btw, did you do a silhoutte plot? what n do you get? @pati-ni do you know if the code I posted for how to get a raw matrix from Seurat is correct for SC3. Not sure I understood the sparse matrix comment by @wikiselev Thank you

pati-ni commented 6 years ago

Hello @cds1 I can not tell from your code, you can check if your assay is in sparse representation with the is.matrix(counts(srat_scatter)) if you get false you can cast it with counts(srat_scatter) <- as.matrix(counts(srat_scatter)) but make sure first you do not run into memory issues. Thanks