faster logmap labels - Githubissues

mihem commented 6 months ago

Issue initially reported here: https://github.com/satijalab/seurat/issues/7879

Integration preprocessing took long (~10min in my case with 120 000 cells and 61 layers). Time was nearly completely spent on Seurat:::CreateIntegrationGroup , more specifically on:

    as.data.frame(x = labels(
      object = cmap,
      values = Cells(x = object[[assay]], layer = scale.layer)

Slow computing here was caused by the sapply function in seurat-object https://github.com/satijalab/seurat-object/blob/58bf437fe058dd78913d9ef7b48008a3e24a306a/R/logmap.R#L238-L247

I rewrote this (also thanks to ChatGPT) using logical indexing. This speeds up computation > 1000x in my use case from 10 min to less than 1 s. So "real" integration steps start nearly instantaneously.

For a reproducible example use:

library(Seurat)
library(SeuratData)
options(future.globals.maxSize = 1e9)

remotes::install_github("mihem/seurat-object@faster_logmap")

InstallData("pbmcsca")
obj <- LoadData("pbmcsca")

obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Method)

obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)

layers <- Layers(object = obj, assay = "RNA", search = "data")
scale.layer <- Layers(object = obj, search = "scale.data")

system.time(
    groups <- Seurat:::CreateIntegrationGroups(obj[["RNA"]], layers = layers, scale.layer = scale.layer)
)

mihem commented 6 months ago

@Gesmira or @dcollins15 Would be nice if you could review that,. simple changes but enormous speedup.

mihem commented 4 months ago

An news here? already some weeks old @Gesmira @igrabski?

satijalab / seurat-object

faster logmap labels #209