kharchenkolab / conos

R package for the joint analysis of multiple single-cell RNA-seq datasets
GNU General Public License v3.0
199 stars 39 forks source link

Long runtime in embedGraph(method="UMAP") #22

Closed rrydbirk closed 5 years ago

rrydbirk commented 5 years ago

This takes 20+ min to run.

Code:

con$embedGraph(method="UMAP")

Object:

lapply(con$samples,function(x) str(x$counts)) Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:14427418] 13 58 96 117 119 134 141 148 180 198 ... ..@ p : int [1:18657] 0 317 1322 1363 1686 1717 1749 1804 2204 3250 ... ..@ Dim : int [1:2] 6255 18656 ..@ Dimnames:List of 2 .. ..$ : chr [1:6255] "S1_AAACCCAAGATCGGTG" "S1_AAACCCAAGGCAGGGA" "S1_AAACCCACAAATAGCA" "S1_AAACCCACATGGAATA" ... .. ..$ : chr [1:18656] "AL627309.1" "AL669831.5" "LINC00115" "NOC2L" ... ..@ x : Named num [1:14427418] 0.0471 0.1322 0.0632 0.0369 0.0526 ... .. ..- attr(, "names")= chr [1:14427418] "S1_AAACGAAGTGTGGTCC" "S1_AAAGTCCTCGGCCAAC" "S1_AACAGGGAGACATATG" "S1_AACCATGCAGAGAAAG" ... ..@ factors : list() Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:16225403] 5 28 63 75 99 118 130 142 175 220 ... ..@ p : int [1:18205] 0 389 1510 1589 1653 2184 2259 2366 3172 4162 ... ..@ Dim : int [1:2] 6470 18204 ..@ Dimnames:List of 2 .. ..$ : chr [1:6470] "S2_AAACCCAAGATGGCAC" "S2_AAACCCAAGTGCGCTC" "S2_AAACCCACAACCAGAG" "ctrl_039_AAACCCACACTGCGTG" ... .. ..$ : chr [1:18204] "AL627309.1" "AL669831.5" "LINC00115" "SAMD11" ... ..@ x : Named num [1:16225403] 0.2572 0.0987 0.0631 0.1405 0.0882 ... .. ..- attr(, "names")= chr [1:16225403] "S2_AAACCCAGTGAAGCTG" "S2_AAAGAACGTTTGATCG" "S2_AAAGTCCTCGCCATAA" "S2_AAATGGAGTATACGGG" ... ..@ factors : list() Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:9283290] 16 18 23 26 50 62 65 75 100 109 ... ..@ p : int [1:17626] 0 271 313 1093 1169 1234 1549 1603 1698 1797 ... ..@ Dim : int [1:2] 3500 17625 ..@ Dimnames:List of 2 .. ..$ : chr [1:3500] "S3_AAACCCACACTACACA" "S3_AAACCCAGTACTAAGA" "S3_AAACCCATCGTTTACT" "S3_AAACCCATCTTGGAAC" ... .. ..$ : chr [1:17625] "AL627309.1" "AC114498.1" "AL669831.5" "LINC00115" ... ..@ x : Named num [1:9283290] 0.1539 0.039 0.0366 0.2743 0.0311 ... .. ..- attr(*, "names")= chr [1:9283290] "S3_AAAGGATCACGCAAAG" "S3_AAAGGATGTAGCTCGC" "S3_AAAGGGCTCGGTTGTA" "S3_AAAGGTACACGCACCA" ... ..@ factors : list() $S1 NULL

$S2 NULL

$S3 NULL

sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /cm/shared/apps/intel/parallel_studio_xe/2018_update2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] conos_0.0.0.9002 igraph_1.2.4 Matrix_1.2-15 RLinuxModules_0.2

loaded via a namespace (and not attached): [1] mclust_5.4.3 Rcpp_1.0.1 mvtnorm_1.0-10 [4] lattice_0.20-38 GO.db_3.7.0 class_7.3-14 [7] assertthat_0.2.1 digest_0.6.18 mime_0.6 [10] R6_2.4.0 plyr_1.8.4 stats4_3.5.0 [13] RSQLite_2.1.1 ggplot2_3.1.1 pillar_1.3.1 [16] rlang_0.3.4 lazyeval_0.2.2 diptest_0.75-7 [19] irlba_2.3.3 whisker_0.3-2 blob_1.1.1 [22] kernlab_0.9-27 S4Vectors_0.20.1 urltools_1.7.2 [25] triebeard_0.3.0 bit_1.1-14 munsell_0.5.0 [28] shiny_1.3.0 compiler_3.5.0 httpuv_1.5.1 [31] pkgconfig_2.0.2 BiocGenerics_0.28.0 base64enc_0.1-3 [34] pcaMethods_1.74.0 htmltools_0.3.6 nnet_7.3-12 [37] tidyselect_0.2.5 tibble_2.1.1 gridExtra_2.3 [40] pagoda2_0.0.0.9002 IRanges_2.16.0 dendextend_1.10.0 [43] viridisLite_0.3.0 crayon_1.3.4 dplyr_0.8.0.1 [46] later_0.8.0 MASS_7.3-51.1 grid_3.5.0 [49] xtable_1.8-3 gtable_0.3.0 DBI_1.0.0 [52] magrittr_1.5 scales_1.0.0 dendsort_0.3.3 [55] viridis_0.5.1 promises_1.0.1 flexmix_2.3-15 [58] robustbase_0.93-4 brew_1.0-6 rjson_0.2.20 [61] tools_3.5.0 fpc_2.1-11.1 bit64_0.9-7 [64] Biobase_2.42.0 glue_1.3.1 trimcluster_0.1-2.1 [67] DEoptimR_1.0-8 purrr_0.3.2 Rook_1.1-1 [70] parallel_3.5.0 AnnotationDbi_1.44.0 colorspace_1.4-1 [73] cluster_2.0.7-1 prabclus_2.2-7 memoise_1.1.0 [76] modeltools_0.2-22

pkharchenko commented 5 years ago

At what stage does it get stuck? Could you make this con$graph object available for us to test on? Thanks, -peter.

On Apr 16, 2019, at 05:20, rrydbirk notifications@github.com wrote:

This takes 20+ min to run.

Code:

con$embedGraph(method="UMAP")

Object:

lapply(con$samples,function(x) str(x$counts)) Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:14427418] 13 58 96 117 119 134 141 148 180 198 ... ..@ p : int [1:18657] 0 317 1322 1363 1686 1717 1749 1804 2204 3250 ... ..@ Dim : int [1:2] 6255 18656 ..@ Dimnames:List of 2 .. ..$ : chr [1:6255] "S1_AAACCCAAGATCGGTG" "S1_AAACCCAAGGCAGGGA" "S1_AAACCCACAAATAGCA" "S1_AAACCCACATGGAATA" ... .. ..$ : chr [1:18656] "AL627309.1" "AL669831.5" "LINC00115" "NOC2L" ... ..@ x : Named num [1:14427418] 0.0471 0.1322 0.0632 0.0369 0.0526 ... .. ..- attr(, "names")= chr [1:14427418] "S1_AAACGAAGTGTGGTCC" "S1_AAAGTCCTCGGCCAAC" "S1_AACAGGGAGACATATG" "S1_AACCATGCAGAGAAAG" ... ..@ factors : list() Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:16225403] 5 28 63 75 99 118 130 142 175 220 ... ..@ p : int [1:18205] 0 389 1510 1589 1653 2184 2259 2366 3172 4162 ... ..@ Dim : int [1:2] 6470 18204 ..@ Dimnames:List of 2 .. ..$ : chr [1:6470] "S2_AAACCCAAGATGGCAC" "S2_AAACCCAAGTGCGCTC" "S2_AAACCCACAACCAGAG" "ctrl_039_AAACCCACACTGCGTG" ... .. ..$ : chr [1:18204] "AL627309.1" "AL669831.5" "LINC00115" "SAMD11" ... ..@ x : Named num [1:16225403] 0.2572 0.0987 0.0631 0.1405 0.0882 ... .. ..- attr(, "names")= chr [1:16225403] "S2_AAACCCAGTGAAGCTG" "S2_AAAGAACGTTTGATCG" "S2_AAAGTCCTCGCCATAA" "S2_AAATGGAGTATACGGG" ... ..@ factors : list() Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:9283290] 16 18 23 26 50 62 65 75 100 109 ... ..@ p : int [1:17626] 0 271 313 1093 1169 1234 1549 1603 1698 1797 ... ..@ Dim : int [1:2] 3500 17625 ..@ Dimnames:List of 2 .. ..$ : chr [1:3500] "S3_AAACCCACACTACACA" "S3_AAACCCAGTACTAAGA" "S3_AAACCCATCGTTTACT" "S3_AAACCCATCTTGGAAC" ... .. ..$ : chr [1:17625] "AL627309.1" "AC114498.1" "AL669831.5" "LINC00115" ... ..@ x : Named num [1:9283290] 0.1539 0.039 0.0366 0.2743 0.0311 ... .. ..- attr(*, "names")= chr [1:9283290] "S3_AAAGGATCACGCAAAG" "S3_AAAGGATGTAGCTCGC" "S3_AAAGGGCTCGGTTGTA" "S3_AAAGGTACACGCACCA" ... ..@ factors : list() $S1 NULL

$S2 NULL

$S3 NULL

sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /cm/shared/apps/intel/parallel_studio_xe/2018_update2/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_gf_lp64.so

locale: [1] C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] conos_0.0.0.9002 igraph_1.2.4 Matrix_1.2-15 RLinuxModules_0.2

loaded via a namespace (and not attached): [1] mclust_5.4.3 Rcpp_1.0.1 mvtnorm_1.0-10 [4] lattice_0.20-38 GO.db_3.7.0 class_7.3-14 [7] assertthat_0.2.1 digest_0.6.18 mime_0.6 [10] R6_2.4.0 plyr_1.8.4 stats4_3.5.0 [13] RSQLite_2.1.1 ggplot2_3.1.1 pillar_1.3.1 [16] rlang_0.3.4 lazyeval_0.2.2 diptest_0.75-7 [19] irlba_2.3.3 whisker_0.3-2 blob_1.1.1 [22] kernlab_0.9-27 S4Vectors_0.20.1 urltools_1.7.2 [25] triebeard_0.3.0 bit_1.1-14 munsell_0.5.0 [28] shiny_1.3.0 compiler_3.5.0 httpuv_1.5.1 [31] pkgconfig_2.0.2 BiocGenerics_0.28.0 base64enc_0.1-3 [34] pcaMethods_1.74.0 htmltools_0.3.6 nnet_7.3-12 [37] tidyselect_0.2.5 tibble_2.1.1 gridExtra_2.3 [40] pagoda2_0.0.0.9002 IRanges_2.16.0 dendextend_1.10.0 [43] viridisLite_0.3.0 crayon_1.3.4 dplyr_0.8.0.1 [46] later_0.8.0 MASS_7.3-51.1 grid_3.5.0 [49] xtable_1.8-3 gtable_0.3.0 DBI_1.0.0 [52] magrittr_1.5 scales_1.0.0 dendsort_0.3.3 [55] viridis_0.5.1 promises_1.0.1 flexmix_2.3-15 [58] robustbase_0.93-4 brew_1.0-6 rjson_0.2.20 [61] tools_3.5.0 fpc_2.1-11.1 bit64_0.9-7 [64] Biobase_2.42.0 glue_1.3.1 trimcluster_0.1-2.1 [67] DEoptimR_1.0-8 purrr_0.3.2 Rook_1.1-1 [70] parallel_3.5.0 AnnotationDbi_1.44.0 colorspace_1.4-1 [73] cluster_2.0.7-1 prabclus_2.2-7 memoise_1.1.0 [76] modeltools_0.2-22

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

rrydbirk commented 5 years ago

con$embedGraph(method="UMAP")) Convert graph to adjacency list... Done Estimate nearest neighbors and commute times... Estimating hitting distances: 14:30:47. 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| (this step easily takes 20+ min irrelevant of n.cores)

Please let me know how I should forward my conos object to you.

At what stage does it get stuck? Could you make this con$graph object available for us to test on? Thanks, -peter.

pkharchenko commented 5 years ago

Probably the easiest way to shore is to do saveRDS(con$graph,file=‘graph.rds’) and share it on dropbox/google drive somewhere. Shouldn’t be too large. Thanks, -peter.

On Apr 16, 2019, at 8:35 AM, Rasmus Rydbirk notifications@github.com wrote:

con$embedGraph(method="UMAP")) Convert graph to adjacency list... Done Estimate nearest neighbors and commute times... Estimating hitting distances: 14:30:47. 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| (this step easily takes 20+ min irrelevant of n.cores)

Please let me know how I should forward my conos object to you.

At what stage does it get stuck? Could you make this con$graph object available for us to test on? Thanks, -peter.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hms-dbmi/conos/issues/22#issuecomment-483640633, or mute the thread https://github.com/notifications/unsubscribe-auth/ALT78oVE9-iIByMnzizLxKvjMFJGF2FCks5vhcOQgaJpZM4cxwhc.

VPetukhov commented 5 years ago

I'm surprised that it works that long for 3 datasets few thousands cells each. Moreover, number of core should be relevant here. But in general, for datasets of about hundred of thousands, twenty minutes is completely fine even for large number of cores (~30).

rrydbirk commented 5 years ago

Sorry for not getting back to you before. The graph object can be downloaded here: https://www.dropbox.com/s/5djfr90twdmkzf6/graph.rds?dl=1

Indeed, the number of cores scales processing time linearly, however, based on the tutorial where you used 12k cells and 4 cores and it took ~1 min to run, versus my example with ~16k cells, I'm surprised it takes 26 min with 10 cores.

rrydbirk commented 5 years ago

Problem was caused by faulty installation of parallel package.