sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
640 stars 145 forks source link

Future Error when running "netAnalysis_compute" #140

Closed rlorenzc closed 3 years ago

rlorenzc commented 3 years ago

Hi,

I'm currently running into issues when I try to run the "netAnalysis_compute" function and it may have something to do with parallel processing. Any help to fix this problem would be great...

This is the error that I get: UNRELIABLE VALUE: Future (‘future_sapply-1’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-2’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-3’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-4’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-5’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-6’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-7’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".UNRELIABLE VALUE: Future (‘future_sapply-8’) unexpectedly generated random numbers without specifying argument 'future.seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

rlorenzc commented 3 years ago

I reinstalled packages and this issue disappeared.

RuiyuRayWang commented 2 years ago

@rlorenzc Hi, I ran into the same issue. Would you mind sharing how you resolved the issue, i.e. by re-installing what packages? I tried re-installing CellChat and future but the error persisted.

My session info:

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=zh_CN.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=zh_CN.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] future.apply_1.9.0    future_1.25.0         CellChat_1.4.0        bigmemory_4.6.1       Biobase_2.50.0        BiocGenerics_0.36.1   igraph_1.3.1         
 [8] forcats_0.5.1         stringr_1.4.0         dplyr_1.0.9           purrr_0.3.4           readr_2.1.2           tidyr_1.2.0           tibble_3.1.7         
[15] ggplot2_3.3.6         tidyverse_1.3.1       SeuratDisk_0.0.0.9019 sp_1.4-7              SeuratObject_4.1.0    Seurat_4.1.1         

loaded via a namespace (and not attached):
  [1] circlize_0.4.15       uuid_1.1-0            readxl_1.4.0          backports_1.4.1       systemfonts_1.0.4     NMF_0.24.0            plyr_1.8.7           
  [8] lazyeval_0.2.2        splines_4.0.5         listenv_0.8.0         scattermore_0.8       gridBase_0.4-7        digest_0.6.29         foreach_1.5.2        
 [15] htmltools_0.5.2       ggalluvial_0.12.3     fansi_1.0.3           magrittr_2.0.3        tensor_1.5            cluster_2.1.3         doParallel_1.0.17    
 [22] ROCR_1.0-11           tzdb_0.3.0            sna_2.6               ComplexHeatmap_2.6.2  globals_0.15.0        modelr_0.1.8          matrixStats_0.62.0   
 [29] svglite_2.1.0         spatstat.sparse_2.1-1 colorspace_2.0-3      rvest_1.0.2           ggrepel_0.9.1         haven_2.5.0           bigmemory.sri_0.1.3  
 [36] crayon_1.5.1          jsonlite_1.8.0        progressr_0.10.0      spatstat.data_2.2-0   survival_3.3-1        zoo_1.8-10            iterators_1.0.14     
 [43] glue_1.6.2            polyclip_1.10-0       registry_0.5-1        gtable_0.3.0          leiden_0.4.2          GetoptLong_1.0.5      shape_1.4.6          
 [50] abind_1.4-5           scales_1.2.0          DBI_1.1.2             rngtools_1.5.2        spatstat.random_2.2-0 miniUI_0.1.1.1        Rcpp_1.0.8.3         
 [57] viridisLite_0.4.0     xtable_1.8-4          clue_0.3-60           reticulate_1.25       spatstat.core_2.4-2   bit_4.0.4             stats4_4.0.5         
 [64] htmlwidgets_1.5.4     httr_1.4.3            FNN_1.1.3             RColorBrewer_1.1-3    ellipsis_0.3.2        ica_1.0-2             farver_2.1.0         
 [71] pkgconfig_2.0.3       uwot_0.1.11           dbplyr_2.1.1          deldir_1.0-6          utf8_1.2.2            labeling_0.4.2        tidyselect_1.1.2     
 [78] rlang_1.0.2           reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.0.5           cli_3.3.0            
 [85] generics_0.1.2        statnet.common_4.6.0  broom_0.8.0           ggridges_0.5.3        fastmap_1.1.0         goftest_1.2-3         bit64_4.0.5          
 [92] fs_1.5.2              fitdistrplus_1.1-8    RANN_2.6.1            pbapply_1.5-0         nlme_3.1-157          mime_0.12             xml2_1.3.3           
 [99] hdf5r_1.3.5           compiler_4.0.5        rstudioapi_0.13       plotly_4.10.0         png_0.1-7             spatstat.utils_2.3-1  reprex_2.0.1         
[106] stringi_1.7.6         RSpectra_0.16-1       rgeos_0.5-9           lattice_0.20-45       Matrix_1.4-1          vctrs_0.4.1           pillar_1.7.0         
[113] lifecycle_1.0.1       GlobalOptions_0.1.2   spatstat.geom_2.4-0   lmtest_0.9-40         RcppAnnoy_0.0.19      data.table_1.14.2     cowplot_1.1.1        
[120] irlba_2.3.5           httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1              network_1.17.1        promises_1.2.0.1      KernSmooth_2.23-20   
[127] gridExtra_2.3         IRanges_2.24.1        parallelly_1.31.1     codetools_0.2-18      MASS_7.3-57           assertthat_0.2.1      rjson_0.2.21         
[134] pkgmaker_0.32.2       withr_2.5.0           sctransform_0.3.3     S4Vectors_0.28.1      mgcv_1.8-40           hms_1.1.1             grid_4.0.5           
[141] rpart_4.1-15          coda_0.19-4           Cairo_1.5-15          Rtsne_0.16            shiny_1.7.1           lubridate_1.8.0
sqjin commented 2 years ago

@rlorenzc Please try the solution here (https://github.com/HenrikBengtsson/parallelly/issues/83)

RuiyuRayWang commented 2 years ago

@sqjin Hi, thanks for the prompt response!

I tried the solution but the issue persisted.

> suppressPackageStartupMessages({
+   library(CellChat)
+   library(future)
+ })
> cellchat <- readRDS("../data/cellchat.rds")
> plan(multisession, workers = 8) # do parallel
> options(parallelly.makeNodePSOCK.validate = (packageVersion("parallelly") > "1.31.1"))  # https://github.com/HenrikBengtsson/parallelly/issues/83
> cellchat <- netAnalysis_computeCentrality(cellchat, slot.name = "netP")
Warning messages:
1: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-1’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
2: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-2’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
3: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-3’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
4: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-4’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
5: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-5’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
6: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-6’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
7: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-7’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 
8: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-8’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore". 

It seems that the issues is unrelated to the link you posted b.c. I never used zn_CN as my language setting.

Instead, I find a workaround simply by disabling the future parallel feature before calling the netAnalysis_computeCentrality function:

> suppressPackageStartupMessages({
+   library(CellChat)
+   library(future)
+ })
> cellchat <- readRDS("../data/cellchat.rds")
> plan(sequential)
> cellchat <- netAnalysis_computeCentrality(cellchat, slot.name = "netP")
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s

Best, Ray

HenrikBengtsson commented 2 years ago

This is because the future framework detects that one of your future_sapply() calls code that relies on the random number generator (RNG) in R, but you did not declare argument future.seed = TRUE in your call to future_sapply(). This suggests you might get insufficient random numbers, which might bias your results.

This is something that needs to be fixed in CellChat. To fix it, make sure to do:

y <- future_sapply(..., future.seed = TRUE)

for the future_sapply() that triggers this warning. To find which one it is, set options(warn = 2L) and then use traceback().

UPDATE: See also Section 'Random Number Generation in the Future Framework' in https://www.jottr.org/2020/09/22/push-for-statistical-sound-rng/.

sqjin commented 2 years ago

@HenrikBengtsson Do you have any idea on how to add 'future.seed' in the cellchat codes below

my.sapply <- ifelse(
  test = future::nbrOfWorkers() == 1,
  yes = pbapply::pbsapply,
  no = future.apply::future_sapply
)
centr.all = my.sapply(
  X = 1:nrun,
  FUN = function(x) {
    net0 <- net[ , , x]
    return(computeCentralityLocal(net0))
  },
  simplify = FALSE
)
HenrikBengtsson commented 2 years ago

See https://github.com/sqjin/CellChat/issues/424, where I suggest dropping the yes/no thing and just use future.apply. In the long run, the easiest to maintain + you get progress updates everywhere.

Otherwise,

my.sapply <- ifelse(
  test = future::nbrOfWorkers() == 1,
  yes = function(..., future.seed = FALSE) pbapply::pbsapply(...),
  no = future.apply::future_sapply
)

and pass as my.sapply(..., future.seed = TRUE).

sqjin commented 2 years ago

@HenrikBengtsson Thanks. I will simply use the future.apply.

By the way, is the performance of future::plan("multiprocess", workers = 4) the same as future::plan("multisession", workers = 4)? I cannot remember clearly, but I think I have tried both last year and found that the running time is longer when using "multisession". The parallel in CellChat is heavily dependent on the future package.

Another thing confused me is the performance of different number of 'workers'. I found the running time is less when use 'workers = 4' compared to workers = 6 or 8. I do not know why, but I found the 4 workers exhibit the least running time.

HenrikBengtsson commented 2 years ago

By the way, is the performance of future::plan("multiprocess", workers = 4) the same as future::plan("multisession", workers = 4)?

'multiprocess' is deprecated and soon to be removed, so please forget about that one. What it was, was that it was an alias to 'multicore' on Linux and macOS and 'multisession' on MS Windows. I, and several others, concluded that this caused confusion and uncertainty, so, we decided to remove it.

I cannot remember clearly, ...

Which confirms why it's a good idea to remove it

but I think I have tried both last year and found that the running time is longer when using "multisession"

So, if you tested on Linux or macOS, then you most likely effectively compared 'multicore' vs 'multisession'. They are different parallel backends (of the parallel package) - see the docs. 'multicore' is not supported on MS Windows, it's unstable in the RStudio Console and other environments, so we recommend to use 'multisession'. See https://parallelly.futureverse.org/reference/supportsMulticore.html and the links there-in for details.

Another thing confused me is the performance of different number of 'workers'. I found the running time is less when use 'workers = 4' compared to workers = 6 or 8. I do not know why, but I found the 4 workers exhibit the least running time.

This is a FAQ in all parallel processing in all programming languages. It simply depends on a lot of things and what code you're parallelizing. There's always a point when the overhead of having more parallel workers becomes larger than the performance gain.

jessicook commented 2 years ago

Hello,

I'm having this same issue and can't figure out how to fix it. should running

y <- future_sapply(..., future.seed = TRUE)

fix the issue? or do I need to add this to the function? I have limited experience in R so I'm a little confused about what the fix above is

kpbradsh commented 1 year ago

Hello, I'm struggling with the same error when I compute centrality in the CellChat tutorial. any help would be appreciated! I'm guessing that I need to figure out how to input future.seed but I'm not sure how. Thank you.

Warning messages: "1: UNRELIABLE VALUE: One of the ‘future.apply’ iterations (‘future_sapply-1’) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to "ignore".

Below is my code:

library(CellChat) library(patchwork) options(stringsAsFactors = FALSE) library(dplyr) library(Matrix) library(data.table) library(ggplot2) library(cowplot) library(tidyr) library(Seurat)

Raw_data <- Read10X(data.dir = 'matrix_files_all_mouse_gene') metadata <- read.csv('metadata_allcells_mouse_gene_removed_pericytes_unknown.csv') rownames(metadata) <- metadata$barcode rownames(metadata)

data.input = Raw_data meta = metadata cell.use = rownames(meta)[meta$treatmentage == "young_sham"]

data.input = data.input[, cell.use] meta = meta[cell.use, ] unique(meta$labeledcelltype)

cellchat <- createCellChat(object = data.input, meta = meta, group.by = "labeledcelltype") cellchat <- addMeta(cellchat, meta = meta) cellchat <- setIdent(cellchat, ident.use = "labeledcelltype") levels(cellchat@idents) groupSize <- as.numeric(table(cellchat@idents)) # number of cells in each cell group

CellChatDB <- CellChatDB.mouse # use CellChatDB.mouse if running on mouse data showDatabaseCategory(CellChatDB)

dplyr::glimpse(CellChatDB$interaction)

CellChatDB.use <- CellChatDB #use all cellchat@DB <- CellChatDB.use cellchat <- subsetData(cellchat) future::plan("multisession", workers = 4) # do parallel

cellchat <- identifyOverExpressedGenes(cellchat) cellchat <- identifyOverExpressedInteractions(cellchat) cellchat <- computeCommunProb(cellchat, type = "truncatedMean", trim = 0.05) #added truncated mean and trim to 0.05 to allow more cells cellchat <- filterCommunication(cellchat, min.cells = 10) df.net <- subsetCommunication(cellchat) cellchat <- computeCommunProbPathway(cellchat) cellchat <- aggregateNet(cellchat) cellchat <- netAnalysis_computeCentrality(cellchat)