Open MaximilianNuber opened 3 days ago
Dear Dr. Mangiola,
Thank you for the very nice package. I am working with large scale single cell RNA seq data and wnat to use tidySingleCellExperiment. I discovered that aggregate_cells takes very long, as compared to aggregateAcrossCells.
aggregate_cells
aggregateAcrossCells
As I am usually working on a server, I recreated the problem with a 225k cell dataset on my laptop: https://cellxgene.cziscience.com/e/dea717d4-7bc0-4e46-950f-fd7e1cc8df7d.cxg/
require(tidySingleCellExperiment) require(tidySummarizedExperiment) #setwd("/Users/maximiliannuber/Documents/CSAMA_2024") sce <- readr::read_rds("Seurat_kidney.rds") sce <- as.SingleCellExperiment(sce)
aggregateAcrossCells runs fast:
system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))
user system elapsed 11.690 2.481 16.056
This code ran very long and I interrupted after about 10 minutes.
I looked at this with Michael Love, and we found this may be an issue with the combination of donor and cell type. This code took just a few seconds:
system.time( pbulk <- sce %>% aggregate_cells(cell_type, assays="counts") )
user system elapsed 10.164 2.333 13.953
Thank you for any help!
output of sessionInfo:
R version 4.4.0 (2024-04-24) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.2.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/Rome tzcode source: internal attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] tidySummarizedExperiment_1.14.0 ttservice_0.4.1 [3] tidyr_1.3.1 tidySingleCellExperiment_1.14.0 [5] muscData_1.18.0 ExperimentHub_2.12.0 [7] AnnotationHub_3.12.0 BiocFileCache_2.12.0 [9] dbplyr_2.5.0 rpx_2.12.0 [11] edgeR_4.2.0 stringr_1.5.1 [13] pheatmap_1.0.12 celldex_1.14.0 [15] SingleR_2.6.0 igraph_2.0.3 [17] GGally_2.2.1 NewWave_1.14.0 [19] scry_1.16.0 scDblFinder_1.18.0 [21] scran_1.32.0 scater_1.32.0 [23] ggplot2_3.5.1 EnsDb.Hsapiens.v86_2.99.0 [25] ensembldb_2.28.0 AnnotationFilter_1.28.0 [27] GenomicFeatures_1.56.0 AnnotationDbi_1.66.0 [29] scuttle_1.14.0 DropletUtils_1.24.0 [31] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 [33] GenomicRanges_1.56.0 GenomeInfoDb_1.40.0 [35] IRanges_2.38.0 S4Vectors_0.42.0 [37] MatrixGenerics_1.16.0 matrixStats_1.3.0 [39] DropletTestFiles_1.14.0 dplyr_1.1.4 [41] limma_3.60.3 RcppSpdlog_0.0.17 [43] Seurat_5.0.3 cellxgene.census_1.14.1 [45] SeuratObject_5.0.1 sp_2.1-4 [47] GEOquery_2.72.0 Biobase_2.64.0 [49] BiocGenerics_0.50.0 loaded via a namespace (and not attached): [1] R.methodsS3_1.8.2 vroom_1.6.5 RcppCCTZ_0.2.12 [4] spdl_0.0.5 goftest_1.2-3 Biostrings_2.72.1 [7] HDF5Array_1.32.0 vctrs_0.6.5 spatstat.random_3.2-3 [10] digest_0.6.35 png_0.1-8 aws.signature_0.6.0 [13] gypsum_1.0.1 tiledb_0.27.0 ggrepel_0.9.5 [16] deldir_2.0-4 parallelly_1.37.1 MASS_7.3-60.2 [19] reshape2_1.4.4 httpuv_1.6.15 withr_3.0.0 [22] xfun_0.43 aws.s3_0.3.21 ellipsis_0.3.2 [25] survival_3.5-8 memoise_2.0.1 ggbeeswarm_0.7.2 [28] zoo_1.8-12 pbapply_1.7-2 R.oo_1.26.0 [31] KEGGREST_1.44.1 promises_1.3.0 httr_1.4.7 [34] restfulr_0.0.15 globals_0.16.3 fitdistrplus_1.1-11 [37] rhdf5filters_1.16.0 ps_1.7.6 rhdf5_2.48.0 [40] rstudioapi_0.16.0 nanotime_0.3.7 UCSC.utils_1.0.0 [43] miniUI_0.1.1.1 generics_0.1.3 processx_3.8.4 [46] base64enc_0.1-3 curl_5.2.1 zlibbioc_1.50.0 [49] ScaledMatrix_1.12.0 polyclip_1.10-6 glmpca_0.2.0 [52] GenomeInfoDbData_1.2.12 SparseArray_1.4.3 desc_1.4.3 [55] xtable_1.8-4 evaluate_0.23 S4Arrays_1.4.0 [58] hms_1.1.3 irlba_2.3.5.1 colorspace_2.1-0 [61] filelock_1.0.3 ROCR_1.0-11 reticulate_1.36.1 [64] spatstat.data_3.0-4 magrittr_2.0.3 lmtest_0.9-40 [67] readr_2.1.5 nanoarrow_0.4.0.1 later_1.3.2 [70] viridis_0.6.5 lattice_0.22-6 spatstat.geom_3.2-9 [73] future.apply_1.11.2 scattermore_1.2 XML_3.99-0.16.1 [76] triebeard_0.4.1 cowplot_1.1.3 RcppAnnoy_0.0.22 [79] pillar_1.9.0 nlme_3.1-164 sna_2.7-2 [82] compiler_4.4.0 beachmat_2.20.0 RSpectra_0.16-1 [85] stringi_1.8.3 tensor_1.5 GenomicAlignments_1.40.0 [88] plyr_1.8.9 crayon_1.5.2 abind_1.4-5 [91] BiocIO_1.14.0 locfit_1.5-9.9 bit_4.0.5 [94] codetools_0.2-20 BiocSingular_1.20.0 alabaster.ranges_1.4.1 [97] plotly_4.10.4 mime_0.12 intergraph_2.0-4 [100] splines_4.4.0 Rcpp_1.0.12 fastDummies_1.7.3 [103] sparseMatrixStats_1.16.0 knitr_1.46 blob_1.2.4 [106] utf8_1.2.4 BiocVersion_3.19.1 fs_1.6.4 [109] listenv_0.9.1 DelayedMatrixStats_1.26.0 pkgbuild_1.4.4 [112] tibble_3.2.1 Matrix_1.7-0 callr_3.7.6 [115] statmod_1.5.0 tzdb_0.4.0 network_1.18.2 [118] pkgconfig_2.0.3 tools_4.4.0 cachem_1.0.8 [121] RSQLite_2.3.7 viridisLite_0.4.2 DBI_1.2.2 [124] fastmap_1.1.1 rmarkdown_2.26 scales_1.3.0 [127] grid_4.4.0 ica_1.0-3 Rsamtools_2.20.0 [130] coda_0.19-4.1 patchwork_1.2.0 ggstats_0.6.0 [133] BiocManager_1.30.23 dotCall64_1.1-1 alabaster.schemas_1.4.0 [136] RANN_2.6.1 farver_2.1.1 yaml_2.3.8 [139] rtracklayer_1.64.0 cli_3.6.2 purrr_1.0.2 [142] leiden_0.4.3.1 lifecycle_1.0.4 uwot_0.2.2 [145] arrow_16.1.0 bluster_1.14.0 BiocParallel_1.38.0 [148] gtable_0.3.5 rjson_0.2.21 ggridges_0.5.6 [151] progressr_0.14.0 parallel_4.4.0 jsonlite_1.8.8 [154] RcppHNSW_0.6.0 bitops_1.0-7 bit64_4.0.5 [157] assertthat_0.2.1 xgboost_1.7.7.1 Rtsne_0.17 [160] alabaster.matrix_1.4.1 spatstat.utils_3.0-4 BiocNeighbors_1.22.0 [163] urltools_1.7.3 alabaster.se_1.4.1 metapod_1.12.0 [166] dqrng_0.3.2 R.utils_2.12.3 alabaster.base_1.4.1 [169] lazyeval_0.2.2 shiny_1.8.1.1 htmltools_0.5.8.1 [172] sctransform_0.4.1 rappdirs_0.3.3 glue_1.7.0 [175] spam_2.10-0 httr2_1.0.1 XVector_0.44.0 [178] RCurl_1.98-1.14 gridExtra_2.3 tiledbsoma_1.11.1 [181] R6_2.5.1 DESeq2_1.44.0 labeling_0.4.3 [184] SharedObject_1.18.0 cluster_2.1.6 pkgload_1.3.4 [187] Rhdf5lib_1.26.0 statnet.common_4.9.0 DelayedArray_0.30.1 [190] tidyselect_1.2.1 vipor_0.4.7 ProtGenerics_1.36.0 [193] xml2_1.3.6 future_1.33.2 rsvd_1.0.5 [196] munsell_0.5.1 KernSmooth_2.23-22 data.table_1.15.4 [199] htmlwidgets_1.6.4 RColorBrewer_1.1-3 rlang_1.1.3 [202] spatstat.sparse_3.0-3 spatstat.explore_3.2-7 remotes_2.5.0 [205] fansi_1.0.6 beeswarm_0.4.0
Thanks!
My apologies. I copied the wrong chunk for the actual example. This following chunk takes longer than 10 min.:
system.time(pbulk <- sce %>% aggregate_cells(c(donor_id, cell_type), assays="counts"))
Dear Dr. Mangiola,
Thank you for the very nice package. I am working with large scale single cell RNA seq data and wnat to use tidySingleCellExperiment. I discovered that
aggregate_cells
takes very long, as compared toaggregateAcrossCells
.As I am usually working on a server, I recreated the problem with a 225k cell dataset on my laptop: https://cellxgene.cziscience.com/e/dea717d4-7bc0-4e46-950f-fd7e1cc8df7d.cxg/
aggregateAcrossCells
runs fast:This code ran very long and I interrupted after about 10 minutes.
I looked at this with Michael Love, and we found this may be an issue with the combination of donor and cell type. This code took just a few seconds:
Thank you for any help!
output of sessionInfo:
Thanks!