satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.3k stars 917 forks source link

Memory Error with Clustering with Leiden algorithm matrix - When to use matrix vs igraph method? #7979

Closed WilliamMWei closed 4 months ago

WilliamMWei commented 1 year ago

Hi,

Thanks for the tool.

I attempted to cluster 45,000 cells using Leiden algorithm, using default argument method = "matrix". However, I encountered a "memory issue". But. when I changed `method = "igraph", it ran fine.

In the help, it mentions to use igraph method when we do not want to cast large dataset to dense matrix, so it seems it simply is to deal with large dataset. But, would you mind letting me know if there is other key difference between using igraph vs matrix methods in terms of the clustering results? And, when should I choose one vs the other?

Related post: https://github.com/scverse/scanpy/issues/1053

Thank you so much for your support!

 pbmc_cd4_cxcr5posneg.data_filtergene_filtercell_list_IndividualDatasetMERGED <- Seurat::FindClusters(pbmc_cd4_cxcr5posneg.data_filtergene_filtercell_list_IndividualDatasetMERGED, algorithm = 4, resolution = 1.2)
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  MemoryError
Run `reticulate::py_last_error()` for details.
In addition: There were 12 warnings (use warnings() to see them)

> reticulate::py_last_error()

── Python Exception Message ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
MemoryError

── R Traceback ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     ▆
  1. ├─Seurat::FindClusters(...)
  2. └─Seurat:::FindClusters.Seurat(...)
  3.   ├─Seurat::FindClusters(...)
  4.   └─Seurat:::FindClusters.default(...)
  5.     └─Seurat:::RunLeiden(...)
  6.       ├─leiden::leiden(...)
  7.       └─leiden:::leiden.matrix(...)
  8.         ├─leiden:::make_py_graph(object, weights = weights)
  9.         └─leiden:::make_py_graph.matrix(object, weights = weights)
 10.           ├─leiden:::make_py_object(object, weights = weights)
 11.           └─leiden:::make_py_object.matrix(object, weights = weights)
 12.             └─adj_mat_py$tolist()
 13.               └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)
> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clustree_0.5.0     ggraph_2.1.0       ggplot2_3.4.4      reticulate_1.34.0  knitr_1.44         SeuratObject_4.1.4 Seurat_4.4.0      

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.15.0      jsonlite_1.8.7         magrittr_2.0.3         spatstat.utils_3.0-3   farver_2.1.1          
  [7] rmarkdown_2.25         fs_1.6.3               vctrs_0.6.4            ROCR_1.0-11            memoise_2.0.1          spatstat.explore_3.2-5
 [13] rstatix_0.7.2          htmltools_0.5.6.1      usethis_2.2.2          broom_1.0.5            sctransform_0.4.1      parallelly_1.36.0     
 [19] KernSmooth_2.23-21     htmlwidgets_1.6.2      ica_1.0-3              plyr_1.8.9             plotly_4.10.3          zoo_1.8-12            
 [25] cachem_1.0.8           igraph_1.5.1           mime_0.12              lifecycle_1.0.3        pkgconfig_2.0.3        Matrix_1.6-1.1        
 [31] R6_2.5.1               fastmap_1.1.1          fitdistrplus_1.1-11    future_1.33.0          shiny_1.7.5.1          digest_0.6.33         
 [37] colorspace_2.1-0       patchwork_1.1.3        ps_1.7.5               rprojroot_2.0.3        tensor_1.5             irlba_2.3.5.1         
 [43] pkgload_1.3.3          ggpubr_0.6.0           labeling_0.4.3         progressr_0.14.0       fansi_1.0.5            spatstat.sparse_3.0-2 
 [49] httr_1.4.7             polyclip_1.10-6        abind_1.4-5            compiler_4.3.1         here_1.0.1             remotes_2.4.2.1       
 [55] withr_2.5.1            backports_1.4.1        viridis_0.6.4          carData_3.0-5          pkgbuild_1.4.2         ggforce_0.4.1         
 [61] ggsignif_0.6.4         MASS_7.3-60            rappdirs_0.3.3         sessioninfo_1.2.2      tools_4.3.1            lmtest_0.9-40         
 [67] httpuv_1.6.12          future.apply_1.11.0    goftest_1.2-3          glue_1.6.2             callr_3.7.3            nlme_3.1-162          
 [73] promises_1.2.1         grid_4.3.1             checkmate_2.2.0        Rtsne_0.16             cluster_2.1.4          reshape2_1.4.4        
 [79] generics_0.1.3         gtable_0.3.4           spatstat.data_3.0-3    tidyr_1.3.0            data.table_1.14.8      tidygraph_1.2.3       
 [85] sp_2.1-1               car_3.1-2              utf8_1.2.4             spatstat.geom_3.2-7    RcppAnnoy_0.0.21       ggrepel_0.9.4         
 [91] RANN_2.6.1             pillar_1.9.0           stringr_1.5.0          later_1.3.1            splines_4.3.1          tweenr_2.0.2          
 [97] dplyr_1.1.3            lattice_0.21-8         survival_3.5-5         deldir_1.0-9           tidyselect_1.2.0       miniUI_0.1.1.1        
[103] pbapply_1.7-2          gridExtra_2.3          scattermore_1.2        xfun_0.40              graphlayouts_1.0.1     devtools_2.4.5        
[109] matrixStats_1.0.0      stringi_1.7.12         lazyeval_0.2.2         yaml_2.3.7             evaluate_0.22          codetools_0.2-19      
[115] tibble_3.2.1           BiocManager_1.30.22    cli_3.6.1              uwot_0.1.16            xtable_1.8-4           munsell_0.5.0         
[121] processx_3.8.2         Rcpp_1.0.11            globals_0.16.2         spatstat.random_3.2-1  png_0.1-8              parallel_4.3.1        
[127] ellipsis_0.3.2         prettyunits_1.2.0      profvis_0.3.8          urlchecker_1.0.1       listenv_0.9.0          viridisLite_0.4.2     
[133] scales_1.2.1           ggridges_0.5.4         leiden_0.4.3           purrr_1.0.2            crayon_1.5.2           rlang_1.1.1           
[139] cowplot_1.1.1         

@denvercal1234GitHub

igrabski commented 4 months ago

Hi, these arguments are fed into the Leiden clustering algorithm implemented here -- there is more explanation on the exact implementation differences in their documentation.