plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
162 stars 17 forks source link

did not converge #75

Closed z5ouyang closed 1 year ago

z5ouyang commented 1 year ago

Thanks for providing this good AI tools! I have a question or asking for suggestions, when I encountered with a larger number of cells as input:

did not converge in 20 iterations

I didn't find any parameters to increase the number of iterations.

Thanks

plger commented 1 year ago

Please follow the issue template, or at least some basic recommendations about how to leave a github issue.

This means minimally providing the code you used that triggered the problem (ideally providing the simplest case that reproduces it), the exact error message (and ideally traceback), and your sessionInfo().

z5ouyang commented 1 year ago

Thanks for the reply. As I mentioned, this might not be a bug. I am asking for suggestions/recommendations to increase the number of iteration with a large cell number (~80k), when a warning shows: did not converge in 20 iterations

The minimal code: Xdbl <- scDblFinder(X,BPPARAM=MulticoreParam(max(1,parallelly::availableCores()-2))) where X is a large sparse matrix with ~80k columns and ~30k rows

The sessionInfo():

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /mnt/depts/dept04/compbio/edge_condaEnv/scRNAsequest/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] BiocParallel_1.24.1 Matrix_1.4-1        ggplot2_3.3.5
[4] scDblFinder_1.4.0   SeuratObject_4.0.4  Seurat_4.1.0

loaded via a namespace (and not attached):
  [1] plyr_1.8.7                  igraph_1.3.1
  [3] lazyeval_0.2.2              splines_4.0.5
  [5] listenv_0.8.0               scattermore_0.8
  [7] GenomeInfoDb_1.26.4         scater_1.18.6
  [9] digest_0.6.29               htmltools_0.5.3
 [11] viridis_0.6.2               fansi_1.0.3
 [13] magrittr_2.0.3              tensor_1.5
 [15] cluster_2.1.3               ROCR_1.0-11
 [17] limma_3.46.0                globals_0.16.1
 [19] matrixStats_0.62.0          spatstat.sparse_3.0-1
 [21] colorspace_2.0-3            ggrepel_0.9.1
 [23] dplyr_1.0.10                RCurl_1.98-1.8
 [25] jsonlite_1.8.0              spatstat.data_3.0-1
 [27] survival_3.4-0              zoo_1.8-11
 [29] glue_1.6.2                  polyclip_1.10-0
 [31] gtable_0.3.1                zlibbioc_1.36.0
 [33] XVector_0.30.0              leiden_0.4.3
 [35] DelayedArray_0.16.3         BiocSingular_1.6.0
 [37] future.apply_1.9.1          SingleCellExperiment_1.12.0
 [39] BiocGenerics_0.36.0         abind_1.4-5
 [41] scales_1.2.1                DBI_1.1.3
 [43] edgeR_3.32.1                spatstat.random_3.1-4
 [45] miniUI_0.1.1.1              Rcpp_1.0.9
 [47] viridisLite_0.4.1           xtable_1.8-4
 [49] reticulate_1.28             spatstat.core_2.4-4
 [51] dqrng_0.3.0                 rsvd_1.0.5
 [53] stats4_4.0.5                htmlwidgets_1.5.4
 [55] httr_1.4.4                  RColorBrewer_1.1-3
 [57] ellipsis_0.3.2              ica_1.0-3
 [59] pkgconfig_2.0.3             scuttle_1.0.4
 [61] uwot_0.1.11                 deldir_1.0-6
 [63] locfit_1.5-9.4              utf8_1.2.2
 [65] tidyselect_1.1.2            rlang_1.0.6
 [67] reshape2_1.4.4              later_1.2.0
 [69] munsell_0.5.0               tools_4.0.5
 [71] xgboost_1.6.2.1             cli_3.4.1
 [73] generics_0.1.3              ggridges_0.5.4
 [75] stringr_1.4.1               fastmap_1.1.0
 [77] goftest_1.2-3               fitdistrplus_1.1-8
 [79] purrr_0.3.4                 RANN_2.6.1
 [81] pbapply_1.5-0               future_1.27.0
 [83] nlme_3.1-159                sparseMatrixStats_1.2.1
 [85] mime_0.12                   scran_1.18.5
 [87] compiler_4.0.5              beeswarm_0.4.0
 [89] plotly_4.10.0               png_0.1-7
 [91] spatstat.utils_3.0-2        statmod_1.4.37
 [93] tibble_3.1.8                stringi_1.7.8
 [95] lattice_0.20-45             bluster_1.0.0
 [97] vctrs_0.4.1                 pillar_1.8.1
 [99] lifecycle_1.0.2             spatstat.geom_3.1-0
[101] lmtest_0.9-40               RcppAnnoy_0.0.19
[103] BiocNeighbors_1.8.2         data.table_1.14.2
[105] cowplot_1.1.1               bitops_1.0-7
[107] irlba_2.3.5                 httpuv_1.6.6
[109] patchwork_1.1.1             GenomicRanges_1.42.0
[111] R6_2.5.1                    promises_1.2.0.1
[113] KernSmooth_2.23-20          gridExtra_2.3
[115] vipor_0.4.5                 IRanges_2.24.1
[117] parallelly_1.32.1           codetools_0.2-18
[119] MASS_7.3-58.1               assertthat_0.2.1
[121] SummarizedExperiment_1.20.0 withr_2.5.0
[123] sctransform_0.3.3           S4Vectors_0.28.1
[125] GenomeInfoDbData_1.2.4      mgcv_1.8-40
[127] parallel_4.0.5              grid_4.0.5
[129] rpart_4.1.16                beachmat_2.6.4
[131] tidyr_1.2.1                 DelayedMatrixStats_1.12.3
[133] MatrixGenerics_1.2.1        Rtsne_0.16
[135] Biobase_2.50.0              shiny_1.7.2
[137] ggbeeswarm_0.6.0
plger commented 1 year ago

You're using a very old version of scDblFinder, from while it was still under development. In fact your whole Bioconductor is 3y old or so, which in the single-cell field is a very long time (as there have been major improvements in the last years). This will also prevent you from installing recent versions from github. I strongly recommend you to update R and bioconductor to the latest release version. (Then you also shouldn't have this warning)

As a second note, I doubt that your 80k cells are from a single capture; if they're not, you should provide the info on the different captures using the samples argument.

z5ouyang commented 1 year ago

Thanks for the suggestions. Seems like you are not sure who (which package) caused this warning, and there is currently no way to adjust the iteration max.

  1. This is not an env only holds scDblFinder, the production env support other pipeline/softwares. And the production env needs keep the consistency of the results. It is not easier just update everything.
  2. It is the single capture of 80k cells.

Seems like you are not sure who (which package) caused this warning, and there is currently no way to adjust the iteration max. I will just close the issue for now.

plger commented 1 year ago

I've change the max iterations, but since you can't update that won't bring you anything. In your context, what you could do is to do the clustering yourself (then you can change the max iterations param), and then pass those clusters to scDblFinder.

So something like this:

sce$clusters <- fastcluster(sce, iter.max=100)
sce <- scDblFinder(sce, clusters="clusters")