Open gorgitko opened 3 years ago
Hi Jiri, many thanks for your request! It all looks awesome and I will try to merge it ASAP, depending on my availability. Thanks again!
Hi Jiri,
I just fetched your changes to my laptop and tried to run the vignette in a Terminal. When I run the first parallelisation it feels super slow, gets stuck at 67% and does not progress further:
> library(BiocParallel)
> BPPARAM <- SnowParam(workers = 4, type = "SOCK", RNGseed = 1, progressbar = TRUE)
> bpstart(BPPARAM)
> BiocParallel::register(BPPARAM)
> sce <- sc3(sce, ks = 2:4, biology = TRUE)
Setting SC3 parameters...
Calculating distances between the cells...
|=============================================== | 67%
It may be a problem with my laptop, but here is my sessionInfo()
:
> sessionInfo()
R Under development (unstable) (2020-09-24 r79253)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] BiocParallel_1.26.1 scater_1.20.1
[3] ggplot2_3.3.5 scuttle_1.2.0
[5] SC3_1.15.1 SingleCellExperiment_1.14.1
[7] SummarizedExperiment_1.22.0 Biobase_2.52.0
[9] GenomicRanges_1.44.0 GenomeInfoDb_1.28.1
[11] IRanges_2.26.0 S4Vectors_0.30.0
[13] BiocGenerics_0.38.0 MatrixGenerics_1.4.0
[15] matrixStats_0.59.0
loaded via a namespace (and not attached):
[1] viridis_0.6.1 BiocSingular_1.8.1
[3] viridisLite_0.4.0 DelayedMatrixStats_1.14.0
[5] shiny_1.6.0 GenomeInfoDbData_1.2.6
[7] vipor_0.4.5 robustbase_0.93-8
[9] pillar_1.6.1 lattice_0.20-41
[11] glue_1.4.2 beachmat_2.8.0
[13] digest_0.6.27 RColorBrewer_1.1-2
[15] promises_1.2.0.1 XVector_0.32.0
[17] colorspace_2.0-2 cowplot_1.1.1
[19] htmltools_0.5.1.1 httpuv_1.6.1
[21] Matrix_1.3-4 pcaPP_1.9-74
[23] WriteXLS_6.3.0 pkgconfig_2.0.3
[25] pheatmap_1.0.12 zlibbioc_1.38.0
[27] purrr_0.3.4 xtable_1.8-4
[29] mvtnorm_1.1-2 snow_0.4-3
[31] scales_1.1.1 ScaledMatrix_1.0.0
[33] later_1.2.0 tibble_3.1.2
[35] proxy_0.4-26 farver_2.1.0
[37] generics_0.1.0 ellipsis_0.3.2
[39] withr_2.4.2 ROCR_1.0-11
[41] magrittr_2.0.1 crayon_1.4.1
[43] mime_0.11 fansi_0.5.0
[45] class_7.3-17 beeswarm_0.4.0
[47] tools_4.1.0 lifecycle_1.0.0
[49] munsell_0.5.0 cluster_2.1.0
[51] DelayedArray_0.18.0 irlba_2.3.3
[53] compiler_4.1.0 e1071_1.7-7
[55] rsvd_1.0.5 rlang_0.4.11
[57] grid_4.1.0 RCurl_1.98-1.3
[59] BiocNeighbors_1.10.0 labeling_0.4.2
[61] bitops_1.0-7 gtable_0.3.0
[63] rrcov_1.5-5 R6_2.5.0
[65] gridExtra_2.3 dplyr_1.0.7
[67] fastmap_1.1.0 utf8_1.2.1
[69] ggbeeswarm_0.6.0 Rcpp_1.0.7
[71] vctrs_0.3.8 DEoptimR_1.0-9
[73] tidyselect_1.1.1 sparseMatrixStats_1.4.0
Do you know what the problem can be?
Hello Vladimir,
I was trying my changes on an HPC server (~80 cores, 1TB RAM) without any problems, so could the performance of your laptop be the problem? Even on that HPC using 32 workers, the whole calculation takes a considerable amount of time.
The calculation of distances between cells takes a lot of memory and during parallel processing, the amount of used memory is even higher (basically three times more). Could you try first sequential processing with BPPARAM <- SerialParam()
? Also, what is the size (dimensionality) of your sce
object? I might suspect your laptop to start using swap space when there isn't enough RAM, which obviously leads to a significant calculation speed drop.
Just for curiosity, if you would try the original implementation of parallelism using the same number of workers, it would finish flawlessly?
Hi Jiri, many thanks for your reply! I am on vacation until Friday, will try your suggestions then. Thanks!
Hi, any updates on this? Soon I will be publishing a package to the Bioconductor, which doesn't allow remote sources (so I can't put my fork to DESCRIPTION
).
Very sorry! I've changed jobs and have absolutely no time to work on this right now. Probably the best is not to rely on me updating it soon... I hope to get to this at some point... Very sorry again.
Hello,
I have been looking at https://github.com/hemberg-lab/SC3/issues/83 and decided to rewrite the parallelism to the BiocParallel package. The reasons could be reviewed in the first bullet below. I have successfully tested those modifications on a relatively small dataset (1k PBMC) and also run devtools::check() resulting in zero errors and warnings.
sc3()
function, which now reuses the sameBPPARAM
for each of the pipeline's subfunctions. Also,BiocParallel
is itself a robust wrapper around theparallel
andsnow
packages, providing a unified interface to backend parallel methods, and comes bundled with powerful logging capabilities (error tracing) and some handy features, such as a native progress bar.DESCRIPTION
, andNAMESPACE
have been updated accordingly.devtools
package, which tries to unify the code and documentation.