plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
153 stars 18 forks source link

Doublet numbers still not reproduced even though I used BPPARAM and bpstart #62

Closed Moonju411 closed 1 year ago

Moonju411 commented 1 year ago

Dear developers, Thank you for nice package.

I know doublet reproducibility already discussed a lot in issue and I also read them. But when I adjust that code to my data, it's still not reproducible. Always give me a different results. I checked my data by using the code which was uploaded on the issue #53. This is the code which I used and the results.

> sce <- as.SingleCellExperiment(my_seurat_object)
> bp <- MulticoreParam(2, RNGseed=123)
> bpstart(bp)
> m1 <- scDblFinder(sce, clusters=sce$cluster, BPPARAM=bp)$scDblFinder.score
Creating ~5000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
iter=0, 83 cells excluded from training.
iter=1, 83 cells excluded from training.
iter=2, 80 cells excluded from training.
Threshold found:0.738
50 (4.7%) doublets called
> bpstop(bp)

> bpstart(bp)
> m2 <- scDblFinder(sce, clusters=sce$cluster, BPPARAM=bp)$scDblFinder.score
Creating ~5000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
iter=0, 76 cells excluded from training.
iter=1, 89 cells excluded from training.
iter=2, 79 cells excluded from training.
Threshold found:0.784
44 (4.1%) doublets called
> bpstop(bp)
> identical(m1,m2)
[1] FALSE

Do you have any ideas about this? My BiocParallel package version is already 1.28.3. I tried a lot but it's not matched again and again... Please help! This is the sessioninfo of my R.

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rsvd_1.0.5                  batchelor_1.10.0            remotes_2.4.2               Nebulosa_1.4.0              patchwork_1.1.1            
 [6] SeuratWrappers_0.3.0        harmony_0.1.0               Rcpp_1.0.8.3                cowplot_1.1.1               dplyr_1.0.9                
[11] Seurat_4.1.0                SeuratObject_4.0.4          scDblFinder_1.11.4          SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[16] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.4            MatrixGenerics_1.6.0       
[21] matrixStats_0.62.0          scaterlegacy_1.5.0          ggplot2_3.3.6               Biobase_2.54.0              BiocGenerics_0.40.0        
[26] BiocParallel_1.28.3        

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                shinydashboard_0.7.2      ks_1.13.5                 R.utils_2.11.0            reticulate_1.24          
  [6] tidyselect_1.1.2          RSQLite_2.2.12            AnnotationDbi_1.56.2      htmlwidgets_1.5.4         grid_4.1.2               
 [11] Rtsne_0.16                munsell_0.5.0             ScaledMatrix_1.2.0        codetools_0.2-18          ica_1.0-2                
 [16] xgboost_1.6.0.1           statmod_1.4.36            scran_1.22.1              future_1.24.0             miniUI_0.1.1.1           
 [21] withr_2.5.0               spatstat.random_2.2-0     colorspace_2.0-3          filelock_1.0.2            rstudioapi_0.13          
 [26] ROCR_1.0-11               tensor_1.5                listenv_0.8.0             labeling_0.4.2            tximport_1.22.0          
 [31] GenomeInfoDbData_1.2.7    polyclip_1.10-0           farver_2.1.0              bit64_4.0.5               rhdf5_2.38.1             
 [36] parallelly_1.31.0         vctrs_0.4.1               generics_0.1.2            BiocFileCache_2.2.1       R6_2.5.1                 
 [41] ggbeeswarm_0.6.0          locfit_1.5-9.5            bitops_1.0-7              rhdf5filters_1.6.0        spatstat.utils_2.3-0     
 [46] cachem_1.0.6              DelayedArray_0.20.0       assertthat_0.2.1          BiocIO_1.4.0              promises_1.2.0.1         
 [51] scales_1.2.0              beeswarm_0.4.0            gtable_0.3.0              beachmat_2.10.0           globals_0.14.0           
 [56] goftest_1.2-3             rlang_1.0.2               splines_4.1.2             rtracklayer_1.54.0        lazyeval_0.2.2           
 [61] spatstat.geom_2.4-0       BiocManager_1.30.16       yaml_2.3.5                reshape2_1.4.4            abind_1.4-5              
 [66] httpuv_1.6.5              tools_4.1.2               ellipsis_0.3.2            spatstat.core_2.4-2       RColorBrewer_1.1-3       
 [71] ggridges_0.5.3            plyr_1.8.7                sparseMatrixStats_1.6.0   progress_1.2.2            zlibbioc_1.40.0          
 [76] purrr_0.3.4               RCurl_1.98-1.6            prettyunits_1.1.1         rpart_4.1.16              deldir_1.0-6             
 [81] pbapply_1.5-0             viridis_0.6.2             zoo_1.8-10                ggrepel_0.9.1             cluster_2.1.3            
 [86] magrittr_2.0.3            data.table_1.14.2         scattermore_0.8           ResidualMatrix_1.4.0      lmtest_0.9-40            
 [91] RANN_2.6.1                mvtnorm_1.1-3             fitdistrplus_1.1-8        hms_1.1.1                 mime_0.12                
 [96] xtable_1.8-4              XML_3.99-0.9              mclust_5.4.9              gridExtra_2.3             scater_1.22.0            
[101] compiler_4.1.2            biomaRt_2.50.3            tibble_3.1.7              KernSmooth_2.23-20        crayon_1.5.1             
[106] R.oo_1.24.0               htmltools_0.5.2           mgcv_1.8-40               later_1.3.0               tidyr_1.2.0              
[111] DBI_1.1.2                 dbplyr_2.1.1              MASS_7.3-56               rappdirs_0.3.3            Matrix_1.4-1             
[116] cli_3.3.0                 R.methodsS3_1.8.1         metapod_1.2.0             parallel_4.1.2            igraph_1.3.1             
[121] pkgconfig_2.0.3           GenomicAlignments_1.30.0  scuttle_1.4.0             plotly_4.10.0             spatstat.sparse_2.1-1    
[126] xml2_1.3.3                vipor_0.4.5               dqrng_0.3.0               XVector_0.34.0            stringr_1.4.0            
[131] digest_0.6.29             pracma_2.3.8              sctransform_0.3.3         RcppAnnoy_0.0.19          spatstat.data_2.2-0      
[136] Biostrings_2.62.0         leiden_0.3.9              uwot_0.1.11               edgeR_3.36.0              DelayedMatrixStats_1.16.0
[141] restfulr_0.0.13           curl_4.3.2                shiny_1.7.1               Rsamtools_2.10.0          rjson_0.2.21             
[146] lifecycle_1.0.1           nlme_3.1-157              jsonlite_1.8.0            Rhdf5lib_1.16.0           BiocNeighbors_1.12.0     
[151] viridisLite_0.4.0         limma_3.50.3              fansi_1.0.3               pillar_1.7.0              lattice_0.20-45          
[156] ggrastr_1.0.1             KEGGREST_1.34.0           fastmap_1.1.0             httr_1.4.2                survival_3.3-1           
[161] glue_1.6.2                png_0.1-7                 bluster_1.4.0             bit_4.0.4                 stringi_1.7.6            
[166] blob_1.2.3                BiocSingular_1.10.0       memoise_2.0.1             irlba_2.3.5               future.apply_1.8.1    
plger commented 1 year ago

Hi, Judging from your code, your dataset has a single sample (i.e. capture). If not, you'd have to use the samples argument. Multithreading is used only with multiple captures, so in the absence of the samples argument BPPARAM is ignored. Therefore, in your case you should be able to ensure reproducibility using simply set.seed before the scDblFinder call.

plger commented 1 year ago

Hi, will close this issue unless you have something to add