aggregate_cells takes too long #110

MaximilianNuber opened 3 days ago

MaximilianNuber commented 3 days ago

Dear Dr. Mangiola,

Thank you for the very nice package. I am working with large scale single cell RNA seq data and wnat to use tidySingleCellExperiment. I discovered that aggregate_cells takes very long, as compared to aggregateAcrossCells.

As I am usually working on a server, I recreated the problem with a 225k cell dataset on my laptop:

sce <- readr::read_rds("Seurat_kidney.rds")
sce <- as.SingleCellExperiment(sce)

aggregateAcrossCells runs fast:

system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))
 user  system elapsed 
 11.690   2.481  16.056 

This code ran very long and I interrupted after about 10 minutes.

system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))

I looked at this with Michael Love, and we found this may be an issue with the combination of donor and cell type. This code took just a few seconds:


        pbulk <- sce %>% 
        aggregate_cells(cell_type, assays="counts")

 user  system elapsed 
 10.164   2.333  13.953 

Thank you for any help!

MaximilianNuber commented 3 days ago

My apologies. I copied the wrong chunk for the actual example. This following chunk takes longer than 10 min.:

system.time(pbulk <- sce %>% 
        aggregate_cells(c(donor_id, cell_type), assays="counts"))