plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
153 stars 18 forks source link

error in serialize(data) #73

Closed wbvguo closed 1 year ago

wbvguo commented 1 year ago

Hi,

Thanks for maintaining this tool, I met a problem when trying this tool when using MulticoreParam

code:

library(scDblFinder)
library(BiocParallel)

sce = as.SingleCellExperiment(seurat_filtered)
sce = scDblFinder(sce, samples="sample_label", BPPARAM=MulticoreParam(4))

Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
Error in manager$availability[[as.character(result$node)]] <- TRUE : 
  wrong args for environment subassignment
In addition: Warning messages:
1: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con, xdr = FALSE) :
  'package:stats' may not be available when loading
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection

When I remove BPPARAM=MulticoreParam(4), the code can be run through without error (although slow). so I guess it might be related to the multiple processing. The object size I am dealing with is 4.3 GB, while the server has more than 140 GB of memory, so I guess it shouldn't be the memory issue, May I ask if you have any idea about this problem and the potential solution?

Thanks,

plger commented 1 year ago

On what platform are you? (you should always report environment and sessionInfo()) Do you have the same problem with BPPARAM=SnowParam(4) ?

wbvguo commented 1 year ago

Hi, thank you for the quick reply!

Here is my sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.30.4 scDblFinder_1.10.0  magrittr_2.0.3      ggplot2_3.4.2       tidyr_1.3.0         tibble_3.2.1        dplyr_1.1.1        
[8] SeuratObject_4.1.3  Seurat_4.3.0       

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  spatstat.explore_3.1-0      reticulate_1.28             tidyselect_1.2.0            htmlwidgets_1.6.2          
  [6] grid_4.2.1                  Rtsne_0.16                  munsell_0.5.0               ScaledMatrix_1.4.1          codetools_0.2-18           
 [11] ica_1.0-3                   statmod_1.5.0               scran_1.24.1                xgboost_1.7.5.1             future_1.32.0              
 [16] miniUI_0.1.1.1              withr_2.5.0                 spatstat.random_3.1-4       colorspace_2.1-0            progressr_0.13.0           
 [21] Biobase_2.56.0              knitr_1.42                  rstudioapi_0.14             stats4_4.2.1                SingleCellExperiment_1.18.1
 [26] ROCR_1.0-11                 tensor_1.5                  listenv_0.9.0               MatrixGenerics_1.8.1        labeling_0.4.2             
 [31] GenomeInfoDbData_1.2.8      polyclip_1.10-4             farver_2.1.1                parallelly_1.35.0           vctrs_0.6.1                
 [36] generics_0.1.3              xfun_0.38                   R6_2.5.1                    GenomeInfoDb_1.32.4         ggbeeswarm_0.7.1           
 [41] rsvd_1.0.5                  locfit_1.5-9.7              bitops_1.0-7                spatstat.utils_3.0-2        DelayedArray_0.22.0        
 [46] promises_1.2.0.1            BiocIO_1.6.0                scales_1.2.1                beeswarm_0.4.0              gtable_0.3.3               
 [51] beachmat_2.12.0             globals_0.16.2              goftest_1.2-3               rlang_1.1.0                 splines_4.2.1              
 [56] rtracklayer_1.56.1          lazyeval_0.2.2              spatstat.geom_3.1-0         yaml_2.3.7                  reshape2_1.4.4             
 [61] abind_1.4-5                 httpuv_1.6.9                tools_4.2.1                 ellipsis_0.3.2              RColorBrewer_1.1-3         
 [66] BiocGenerics_0.42.0         ggridges_0.5.4              Rcpp_1.0.10                 plyr_1.8.8                  sparseMatrixStats_1.8.0    
 [71] zlibbioc_1.42.0             purrr_1.0.1                 RCurl_1.98-1.12             deldir_1.0-6                pbapply_1.7-0              
 [76] viridis_0.6.2               cowplot_1.1.1               S4Vectors_0.34.0            zoo_1.8-11                  SummarizedExperiment_1.26.1
 [81] ggrepel_0.9.3               cluster_2.1.3               data.table_1.14.8           scattermore_0.8             lmtest_0.9-40              
 [86] RANN_2.6.1                  fitdistrplus_1.1-8          matrixStats_0.63.0          patchwork_1.1.2             mime_0.12                  
 [91] evaluate_0.20               xtable_1.8-4                XML_3.99-0.14               IRanges_2.30.1              gridExtra_2.3              
 [96] compiler_4.2.1              scater_1.24.0               KernSmooth_2.23-20          crayon_1.5.2                htmltools_0.5.5            
[101] later_1.3.0                 snow_0.4-4                  DBI_1.1.3                   MASS_7.3-58                 Matrix_1.5-4               
[106] cli_3.6.1                   parallel_4.2.1              metapod_1.4.0               igraph_1.4.2                GenomicRanges_1.48.0       
[111] pkgconfig_2.0.3             GenomicAlignments_1.32.1    sp_1.6-0                    plotly_4.10.1               scuttle_1.6.3              
[116] spatstat.sparse_3.0-1       vipor_0.4.5                 dqrng_0.3.0                 XVector_0.36.0              stringr_1.5.0              
[121] digest_0.6.31               sctransform_0.3.5           RcppAnnoy_0.0.20            spatstat.data_3.0-1         Biostrings_2.64.1          
[126] rmarkdown_2.21              leiden_0.4.3                uwot_0.1.14                 edgeR_3.38.4                DelayedMatrixStats_1.18.2  
[131] restfulr_0.0.15             shiny_1.7.4                 Rsamtools_2.12.0            rjson_0.2.21                lifecycle_1.0.3            
[136] nlme_3.1-162                jsonlite_1.8.4              BiocNeighbors_1.14.0        viridisLite_0.4.1           limma_3.52.4               
[141] fansi_1.0.4                 pillar_1.9.0                lattice_0.20-45             ggrastr_1.0.1               fastmap_1.1.1              
[146] httr_1.4.5                  survival_3.5-5              glue_1.6.2                  png_0.1-8                   bluster_1.6.0              
[151] stringi_1.7.12              BiocSingular_1.12.0         irlba_2.3.5.1               future.apply_1.10.0        

I tested with BPPARAM=SnowParam(4), it did not report an error, but had the following warning message

Warning messages:
1: <anonymous>: ... may be used in an incorrect context: 
     scDblFinder(sce[sel_features, x], clusters = clusters, dims = dims, 
         dbr = dbr, dbr.sd = dbr.sd, clustCor = clustCor, unident.th = unident.th, 
         knownDoublets = knownDoublets, knownUse = knownUse, artificialDoublets = artificialDoublets, 
         k = k, processing = processing, nfeatures = nfeatures, propRandom = propRandom, 
         includePCs = includePCs, propMarkers = propMarkers, trainingFeatures = trainingFeatures, 
         returnType = returnType, threshold = isSplitMode, score = ifelse(isSplitMode, 
             score, "weighted"), removeUnidentifiable = removeUnidentifiable, 
         verbose = FALSE, aggregateFeatures = aggregateFeatures, ...)

2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
3: In serialize(data, node$con) :
  'package:stats' may not be available when loading
4: In serialize(data, node$con) :
  'package:stats' may not be available when loading

I have another question: is scDBlFinder a deterministic tool? If we run the tool n times, will it always give the same result?

Thanks,

plger commented 1 year ago

No it is not deterministic. See section 1.5.5 of the vignette to make it reproducible.

I'm afraid your first BiocParallel error isn't something I can help you with. Perhaps @LTLA has seen this before (the manager$availability I've never seen before)?

wbvguo commented 1 year ago

Thank you for the reply, for the non-multithreading case (say no BPPARAM parameter was used), will set.seed be sufficient to make the results reproducible?

I am closing this issue now as there are alternative ways to get around it

Thanks,

plger commented 1 year ago

Yes, without multithreading set.seed should be sufficient.

plger commented 1 year ago

Actually no, if you're using samples you need to set it in BPPARAM=SerialParam(RNGseed = seed) (see #59)