satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.28k stars 910 forks source link

NormalizeData() in Seurat v5 is very slow #7820

Open Shiyc-Lab opened 1 year ago

Shiyc-Lab commented 1 year ago

when running NormalizaData() using the same data, v4 would finish it soon but v5 will keep running and never stop(at least 10 hours).

the v4 session is like: ''' R version 4.3.1 (2023-06-16) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS

Matrix products: default BLAS/LAPACK: /data1/users/zhoux1/.conda/envs/r4/lib/libopenblasp-r0.3.24.so; LAPACK version 3.11.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Asia/Shanghai tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.4.3 BPCells_0.1.0 stringr_1.5.0 dplyr_1.1.3
[5] reticulate_1.32.0 SeuratObject_4.1.3 Seurat_4.3.0.1

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 jsonlite_1.8.7 magrittr_2.0.3
[4] spatstat.utils_3.0-3 zlibbioc_1.46.0 vctrs_0.6.3
[7] ROCR_1.0-11 spatstat.explore_3.2-3 RCurl_1.98-1.12
[10] base64enc_0.1-3 htmltools_0.5.6 sctransform_0.3.5
[13] parallelly_1.36.0 KernSmooth_2.23-22 htmlwidgets_1.6.2
[16] ica_1.0-3 plyr_1.8.8 plotly_4.10.2
[19] zoo_1.8-12 uuid_1.1-1 igraph_1.5.1
[22] mime_0.12 lifecycle_1.0.3 pkgconfig_2.0.3
[25] Matrix_1.6-1 R6_2.5.1 fastmap_1.1.1
[28] GenomeInfoDbData_1.2.10 fitdistrplus_1.1-11 future_1.33.0
[31] shiny_1.7.5 digest_0.6.33 colorspace_2.1-0
[34] patchwork_1.1.3 S4Vectors_0.38.1 tensor_1.5
[37] irlba_2.3.5.1 GenomicRanges_1.52.0 progressr_0.14.0
[40] fansi_1.0.4 spatstat.sparse_3.0-2 httr_1.4.7
[43] polyclip_1.10-4 abind_1.4-5 compiler_4.3.1
[46] withr_2.5.0 MASS_7.3-60 tools_4.3.1
[49] lmtest_0.9-40 httpuv_1.6.11 future.apply_1.11.0
[52] goftest_1.2-3 glue_1.6.2 nlme_3.1-163
[55] promises_1.2.1 grid_4.3.1 pbdZMQ_0.3-10
[58] Rtsne_0.16 cluster_2.1.4 reshape2_1.4.4
[61] generics_0.1.3 gtable_0.3.4 spatstat.data_3.0-1
[64] tidyr_1.3.0 data.table_1.14.8 sp_2.0-0
[67] utf8_1.2.3 XVector_0.40.0 BiocGenerics_0.46.0
[70] spatstat.geom_3.2-5 RcppAnnoy_0.0.21 ggrepel_0.9.3
[73] RANN_2.6.1 pillar_1.9.0 IRdisplay_1.1
[76] later_1.3.1 splines_4.3.1 lattice_0.21-8
[79] survival_3.5-7 deldir_1.0-9 tidyselect_1.2.0
[82] miniUI_0.1.1.1 pbapply_1.7-2 gridExtra_2.3
[85] IRanges_2.34.1 scattermore_1.2 stats4_4.3.1
[88] matrixStats_1.0.0 stringi_1.7.12 lazyeval_0.2.2
[91] evaluate_0.21 codetools_0.2-19 tibble_3.2.1
[94] cli_3.6.1 uwot_0.1.16 IRkernel_1.3.2
[97] xtable_1.8-4 repr_1.1.6 munsell_0.5.0
[100] Rcpp_1.0.11 GenomeInfoDb_1.36.1 globals_0.16.2
[103] spatstat.random_3.1-6 png_0.1-8 parallel_4.3.1
[106] ellipsis_0.3.2 bitops_1.0-7 listenv_0.9.0
[109] viridisLite_0.4.2 scales_1.2.1 ggridges_0.5.4
[112] leiden_0.4.3 purrr_1.0.2 crayon_1.5.2
[115] rlang_1.1.1 cowplot_1.1.1 ''' and v5 session is like: ''' R version 4.3.1 (2023-06-16) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.1 LTS

Matrix products: default BLAS/LAPACK: /data1/users/zhoux1/.conda/envs/r-seurat5/lib/libopenblasp-r0.3.24.so; LAPACK version 3.11.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Asia/Shanghai tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.4.3 BPCells_0.1.0 stringr_1.5.0
[4] dplyr_1.1.3 reticulate_1.32.0 Seurat_4.9.9.9060
[7] SeuratObject_4.9.9.9091 sp_2.0-0

loaded via a namespace (and not attached): [1] deldir_1.0-9 pbapply_1.7-2 gridExtra_2.3
[4] rlang_1.1.1 magrittr_2.0.3 RcppAnnoy_0.0.21
[7] spatstat.geom_3.2-5 matrixStats_1.0.0 ggridges_0.5.4
[10] compiler_4.3.1 png_0.1-8 vctrs_0.6.3
[13] reshape2_1.4.4 pkgconfig_2.0.3 crayon_1.5.2
[16] fastmap_1.1.1 ellipsis_0.3.2 utf8_1.2.3
[19] promises_1.2.1 purrr_1.0.2 jsonlite_1.8.7
[22] goftest_1.2-3 later_1.3.1 uuid_1.1-1
[25] spatstat.utils_3.0-3 irlba_2.3.5.1 parallel_4.3.1
[28] cluster_2.1.4 R6_2.5.1 ica_1.0-3
[31] spatstat.data_3.0-1 stringi_1.7.12 RColorBrewer_1.1-3
[34] parallelly_1.36.0 lmtest_0.9-40 scattermore_1.2
[37] Rcpp_1.0.11 IRkernel_1.3.2.9000 tensor_1.5
[40] future.apply_1.11.0 zoo_1.8-12 base64enc_0.1-3
[43] sctransform_0.4.0 httpuv_1.6.11 Matrix_1.6-1.1
[46] splines_4.3.1 igraph_1.5.1 tidyselect_1.2.0
[49] abind_1.4-5 spatstat.random_3.1-6 codetools_0.2-19
[52] miniUI_0.1.1.1 spatstat.explore_3.2-3 listenv_0.9.0
[55] lattice_0.21-8 tibble_3.2.1 plyr_1.8.8
[58] withr_2.5.0 shiny_1.7.5 ROCR_1.0-11
[61] evaluate_0.21 Rtsne_0.16 future_1.33.0
[64] fastDummies_1.7.3 survival_3.5-7 polyclip_1.10-4
[67] fitdistrplus_1.1-11 pillar_1.9.0 KernSmooth_2.23-22
[70] plotly_4.10.2 generics_0.1.3 RcppHNSW_0.5.0
[73] IRdisplay_1.1 munsell_0.5.0 scales_1.2.1
[76] globals_0.16.2 xtable_1.8-4 glue_1.6.2
[79] lazyeval_0.2.2 tools_4.3.1 data.table_1.14.8
[82] RSpectra_0.16-1 pbdZMQ_0.3-10 RANN_2.6.1
[85] leiden_0.4.3 dotCall64_1.0-2 cowplot_1.1.1
[88] grid_4.3.1 tidyr_1.3.0 colorspace_2.1-0
[91] nlme_3.1-163 patchwork_1.1.3 repr_1.1.6
[94] cli_3.6.1 spatstat.sparse_3.0-2 spam_2.9-1
[97] fansi_1.0.4 viridisLite_0.4.2 uwot_0.1.16
[100] gtable_0.3.4 digest_0.6.33 progressr_0.14.0
[103] ggrepel_0.9.3 htmlwidgets_1.6.2 htmltools_0.5.6
[106] lifecycle_1.0.3 httr_1.4.7 mime_0.12
[109] MASS_7.3-60 '''

Shiyc-Lab commented 1 year ago

the size of data is 500000*20000

Gesmira commented 1 year ago

Hi, A user previously reported a similar issue here. Can you confirm the type of your counts matrices? What does class(obj[["RNA"]]$counts) return? The fix for them was to convert their counts matrices to dgCMatrices by running the following for each layer: obj[["RNA"]]$counts <- as(obj[["RNA"]]$counts, "CsparseMatrix")

Shiyc-Lab commented 1 year ago

Well, the type is actually dgCMatrices. image

I downloaded raw counts matrices and using CreateSeuratObject(). image

Hi, A user previously reported a similar issue here. Can you confirm the type of your counts matrices? What does class(obj[["RNA"]]$counts) return? The fix for them was to convert their counts matrices to dgCMatrices by running the following for each layer: obj[["RNA"]]$counts <- as(obj[["RNA"]]$counts, "CsparseMatrix")