Closed JTpath closed 3 years ago
Hi, do you get the same error without the MulticoreParam argument?
No, it even works if I set MulticoreParam to(1)
library(Seurat)
#> Registered S3 method overwritten by 'spatstat.geom':
#> method from
#> print.boxx cli
#> Attaching SeuratObject
library(scDblFinder)
library(BiocParallel)
l <- c(pbmc_small, pbmc_small)
l[[1]][["batch"]] = "A"
l[[2]][["batch"]] = "B"
seu <- merge(x=l[[1]], y=l[[2]])
#> Warning in CheckDuplicateCellNames(object.list = objects): Some cell names are
#> duplicated across objects provided. Renaming to enforce unique cell names.
sce <- as.SingleCellExperiment(seu)
out <- scDblFinder(sce, samples = "batch", BPPARAM=MulticoreParam(1))
#> Warning in scDblFinder(sce[sel_features, x], clusters = clusters, dims = dims, :
#> scDblFinder might not work well with very low numbers of cells.
#> Warning in scDblFinder(sce[sel_features, x], clusters = clusters, dims = dims, :
#> scDblFinder might not work well with very low numbers of cells.
Created on 2021-05-02 by the reprex package (v2.0.0)
Wow, thanks. Considering that one of your samples apparently has very few cells (which can cause problems), one possibility is that the job fails once in a while depending on the random seed, but if you tried several times and one always fails while the other always work, I think we can exclude that.
Could you try with:
MulticoreParam(2, log=TRUE, threshold="TRACE")
?
Hi, the examples above use Seurat's built in pbmc_small dataset so they're reproducible, but yes they're tiny datasets.
I get the same error with my own data, and with a the larger publicly accessible 10x dataset (below) so don't think it's low cells. MulticoreParam(2, log=TRUE, threshold="TRACE")
doesn't help sadly.
Thanks
library(Seurat)
#> Attaching SeuratObject
library(scDblFinder)
library(BiocParallel)
url <- "https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_filtered_feature_bc_matrix.h5"
tmp_filename <- tempfile()
download.file(url, tmp_filename)
seu <- Read10X_h5(tmp_filename, use.names = TRUE)
#> Warning in sparseMatrix(i = indices[] + 1, p = indptr[], x = as.numeric(x =
#> counts[]), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
seu <- CreateSeuratObject(seu)
seu$batch <- sample(c("rep1", "rep2"), size = ncol(seu), replace = TRUE)
sce <- as.SingleCellExperiment(seu)
out <- scDblFinder(sce, samples = "batch", BPPARAM=MulticoreParam(2, log=TRUE, threshold="TRACE"))
#> Warning in parallel::mccollect(wait = FALSE, timeout = 1): 1 parallel job did
#> not deliver a result
#> Error in result[[njob]] <- value: attempt to select less than one element in OneIndex
Created on 2021-05-02 by the reprex package (v2.0.0)
hmm... I take it that BiocParallel works fine in other contexts?
Hmm, think so in that it isn't erroring out but doesn't seem entirely happy.
Ran bplapply(1:6, sqrt, BPPARAM = MulticoreParam(3))
several times, all worked fine, then tried scDblFinder(sce, samples = "batch", BPPARAM=MulticoreParam(3))
and got the same error as above.
Ran bplapply(1:6, sqrt, BPPARAM = MulticoreParam(3))
straight after and got a warning the first time, subsequent tries were fine.
#> Warning message: In parallel::mccollect(wait = TRUE) : 1 parallel job did not deliver a result
Chiming in here. The "1 parallel job did not deliver a result" warning (and subsequent indexing error) is usually observed when the child R process is killed by some kind of external system monitor. I often see this in Slurm jobs where the parallelization increases memory consumption to the point that the total usage exceeds the allocated memory, causing the out-of-memory killer to shut down the last offending process - usually the child. Because the R process is killed externally, R itself does not have a chance to give a reasonable error message back to the parent, leading to a rather cryptic set of warnings and errors.
It's rather unusual to see this on personal machines, as these tend to just go into swap and slow down rather than actively killing the process. Best bet is to try SnowParam()
instead; this is slower but should avoid the difficulties associated with forking and shared memory.
Thanks Aaron, that's very useful. If this is the cause then it would also explain that it's not occurring with toy examples...
Thanks, it does seem to be a local problem, ran the same code on our HPC and it's fine. Not sure what I've done to upset MulticoreParam though.
I ran microbenchmark on my Mac using the PBMC_small example above,
res <- list(
"SnowParam(2)" = microbenchmark(scDblFinder(sce, samples="batch", BPPARAM=SnowParam(2)), times = 10),
"lapply" = microbenchmark(scDblFinder(sce, samples="batch"), times = 10))
Results in seconds below, isn't any better with larger datasets. Have I done something wrong with snowparam?
SnowParam(2) lapply
min 13.3744 3.415405
lq 13.71814 3.505418
mean 13.84331 3.685679
median 13.80867 3.620882
uq 13.86585 3.91428
max 14.58285 4.106512
neval 10 10
Simiar results on our HPC, but with an improvement for Multicore
Multicore(2) SnowParam(2) lapply
min 2.792165 10.64098 4.113479
lq 2.887825 10.87092 4.173983
mean 2.89723 11.17549 4.546509
median 2.913186 11.12241 4.314554
uq 2.930263 11.31667 4.879823
max 2.974608 11.95432 5.372606
neval 10 10 10
Anyway, think i'm quite far off topic now... doesn't look like it has anything to do with scDblFinder (which is awesome btw)
Yes, Snow has high overhead - it's starting up a whole other R process from scratch! - and so is not worth using unless your dataset is decently big. Multicore is fast because the new process shares the same memory as the old one, but this can occasionally lead to weird things happening where both processes overestimate their amount of available memory.
Good to know, thank you
I think this is caused by using the Mac OS X accelerate framework BLAS rather than R BLAS. Switched back to the standard one and it's working fine now
great, thanks for reporting back on this!
@JTpath I'm having the same problem you are. Can you explain a bit more what you did to solve this? Also, do you have any idea what caused MulticoreParam to stop working in the first place?
Switched back to the standard R BLAS and it worked fine. See a discussion of BLAS here, https://csantill.github.io/RPerformanceWBLAS/ might be a bit out of date now.
Hi, not sure if this is an issue with scDblFinder, BiocParallel, or me, but this worked in the past but isn't working any more for some reason.
Created on 2021-05-02 by the reprex package (v2.0.0)
Session info
``` r sessioninfo::session_info() ─ Session info ───────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.0.5 (2021-03-31) os macOS Big Sur 10.16 system x86_64, darwin17.0 ui RStudio language (EN) collate en_GB.UTF-8 ctype en_GB.UTF-8 tz Europe/London date 2021-05-02 ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.2) assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2) backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2) beachmat 2.6.4 2020-12-20 [1] Bioconductor beeswarm 0.3.1 2021-03-07 [1] CRAN (R 4.0.3) Biobase 2.50.0 2020-10-27 [1] Bioconductor BiocGenerics 0.36.1 2021-04-16 [1] Bioconductor BiocNeighbors 1.8.2 2020-12-07 [1] Bioconductor BiocParallel * 1.24.1 2020-11-06 [1] Bioconductor BiocSingular 1.6.0 2020-10-27 [1] Bioconductor bitops 1.0-7 2021-04-24 [1] CRAN (R 4.0.2) bluster 1.0.0 2020-10-27 [1] Bioconductor callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.2) cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2) clipr 0.7.1 2020-10-08 [1] CRAN (R 4.0.2) cluster 2.1.2 2021-04-17 [1] CRAN (R 4.0.2) codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.5) colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2) cowplot 1.1.1 2020-12-30 [1] CRAN (R 4.0.2) crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2) data.table 1.14.0 2021-02-21 [1] CRAN (R 4.0.2) DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3) DelayedArray 0.16.3 2021-03-24 [1] Bioconductor DelayedMatrixStats 1.12.3 2021-02-03 [1] Bioconductor deldir 0.2-10 2021-02-16 [1] CRAN (R 4.0.2) digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) dplyr 1.0.5 2021-03-05 [1] CRAN (R 4.0.2) dqrng 0.2.1 2019-05-17 [1] CRAN (R 4.0.2) edgeR 3.32.1 2021-01-14 [1] Bioconductor ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2) evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1) fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2) fitdistrplus 1.1-3 2020-12-05 [1] CRAN (R 4.0.2) fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) future 1.21.0 2020-12-10 [1] CRAN (R 4.0.3) future.apply 1.7.0 2021-01-04 [1] CRAN (R 4.0.2) generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2) GenomeInfoDb 1.26.7 2021-04-08 [1] Bioconductor GenomeInfoDbData 1.2.4 2021-01-17 [1] Bioconductor GenomicRanges 1.42.0 2020-10-27 [1] Bioconductor ggbeeswarm 0.6.0 2017-08-07 [1] CRAN (R 4.0.2) ggplot2 3.3.3 2020-12-30 [1] CRAN (R 4.0.2) ggrepel 0.9.1 2021-01-15 [1] CRAN (R 4.0.2) ggridges 0.5.3 2021-01-08 [1] CRAN (R 4.0.2) globals 0.14.0 2020-11-22 [1] CRAN (R 4.0.2) glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) goftest 1.2-2 2019-12-02 [1] CRAN (R 4.0.2) gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.2) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2) highr 0.9 2021-04-16 [1] CRAN (R 4.0.5) htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2) htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.3) httpuv 1.6.0 2021-04-23 [1] CRAN (R 4.0.2) httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) ica 1.0-2 2018-05-24 [1] CRAN (R 4.0.2) igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2) IRanges 2.24.1 2020-12-12 [1] Bioconductor irlba 2.3.3 2019-02-05 [1] CRAN (R 4.0.2) jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2) KernSmooth 2.23-18 2020-10-29 [1] CRAN (R 4.0.5) knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2) later 1.2.0 2021-04-23 [1] CRAN (R 4.0.2) lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.5) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.0.2) leiden 0.3.7 2021-01-26 [1] CRAN (R 4.0.3) lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2) limma 3.46.0 2020-10-27 [1] Bioconductor listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2) lmtest 0.9-38 2020-09-09 [1] CRAN (R 4.0.2) locfit 1.5-9.4 2020-03-25 [1] CRAN (R 4.0.2) magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) MASS 7.3-53.1 2021-02-12 [1] CRAN (R 4.0.5) Matrix 1.3-2 2021-01-06 [1] CRAN (R 4.0.5) MatrixGenerics 1.2.1 2021-01-30 [1] Bioconductor matrixStats 0.58.0 2021-01-29 [1] CRAN (R 4.0.2) mgcv 1.8-35 2021-04-18 [1] CRAN (R 4.0.2) mime 0.10 2021-02-13 [1] CRAN (R 4.0.2) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.0.2) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2) nlme 3.1-152 2021-02-04 [1] CRAN (R 4.0.5) parallelly 1.24.0 2021-03-14 [1] CRAN (R 4.0.2) patchwork 1.1.1 2020-12-17 [1] CRAN (R 4.0.2) pbapply 1.4-3 2020-08-18 [1] CRAN (R 4.0.2) pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.5) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2) plotly 4.9.3 2021-01-10 [1] CRAN (R 4.0.2) plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2) png 0.1-7 2013-12-03 [1] CRAN (R 4.0.2) polyclip 1.10-0 2019-03-14 [1] CRAN (R 4.0.2) processx 3.5.1 2021-04-04 [1] CRAN (R 4.0.2) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.0.2) ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2) R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.2) RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.2) Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2) RcppAnnoy 0.0.18 2020-12-15 [1] CRAN (R 4.0.2) RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 4.0.2) reprex * 2.0.0 2021-04-02 [1] CRAN (R 4.0.2) reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.0.2) reticulate 1.19 2021-04-21 [1] CRAN (R 4.0.2) rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2) rmarkdown 2.7 2021-02-19 [1] CRAN (R 4.0.2) ROCR 1.0-11 2020-05-02 [1] CRAN (R 4.0.2) rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.5) rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2) rsvd 1.0.5 2021-04-16 [1] CRAN (R 4.0.5) Rtsne 0.15 2018-11-10 [1] CRAN (R 4.0.2) S4Vectors 0.28.1 2020-12-09 [1] Bioconductor scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2) scater 1.18.6 2021-02-26 [1] Bioconductor scattermore 0.7 2020-11-24 [1] CRAN (R 4.0.2) scDblFinder * 1.5.16 2021-04-19 [1] Github (plger/scDblFinder@d11467b) scran 1.18.7 2021-04-16 [1] Bioconductor sctransform 0.3.2.9006 2021-04-01 [1] Github (ChristophH/sctransform@73e2e3e) scuttle 1.0.4 2020-12-17 [1] Bioconductor sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2) Seurat * 4.0.1 2021-04-13 [1] Github (satijalab/seurat@4e868fc) SeuratObject * 4.0.0 2021-01-15 [1] CRAN (R 4.0.2) shiny 1.6.0 2021-01-25 [1] CRAN (R 4.0.3) SingleCellExperiment 1.12.0 2020-10-27 [1] Bioconductor sparseMatrixStats 1.2.1 2021-02-02 [1] Bioconductor spatstat.core 2.1-2 2021-04-18 [1] CRAN (R 4.0.2) spatstat.data 2.1-0 2021-03-21 [1] CRAN (R 4.0.3) spatstat.geom 2.1-0 2021-04-15 [1] CRAN (R 4.0.2) spatstat.sparse 2.0-0 2021-03-16 [1] CRAN (R 4.0.2) spatstat.utils 2.1-0 2021-03-15 [1] CRAN (R 4.0.2) statmod 1.4.35 2020-10-19 [1] CRAN (R 4.0.2) stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2) styler 1.4.1 2021-03-30 [1] CRAN (R 4.0.2) SummarizedExperiment 1.20.0 2020-10-27 [1] Bioconductor survival 3.2-11 2021-04-26 [1] CRAN (R 4.0.2) tensor 1.5 2012-05-05 [1] CRAN (R 4.0.2) tibble 3.1.1 2021-04-18 [1] CRAN (R 4.0.2) tidyr 1.1.3 2021-03-03 [1] CRAN (R 4.0.3) tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2) utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2) uwot 0.1.10 2020-12-15 [1] CRAN (R 4.0.2) vctrs 0.3.7 2021-03-29 [1] CRAN (R 4.0.2) vipor 0.4.5 2017-03-22 [1] CRAN (R 4.0.2) viridis 0.6.0 2021-04-15 [1] CRAN (R 4.0.5) viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.0.5) withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.5) xfun 0.22 2021-03-11 [1] CRAN (R 4.0.2) xgboost 1.4.1.1 2021-04-22 [1] CRAN (R 4.0.2) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.0.2) XVector 0.30.0 2020-10-28 [1] Bioconductor yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) zlibbioc 1.36.0 2020-10-28 [1] Bioconductor zoo 1.8-9 2021-03-09 [1] CRAN (R 4.0.2) [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ```