waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
70 stars 32 forks source link

Questions about `.sampleMapFromData()` #314

Closed cvanderaa closed 2 years ago

cvanderaa commented 2 years ago

Hi Marcel, @LiNk-NY

Context

Laurent and I are maintaining the QFeatures class that directly inherits from the MultiAssayExperiement class.

I have been profiling the QFeatures code for adding new assays to the QFeatures. Our software is currently a bit slow for large single-cell proteomics datasets, and these will soon become larger... I have noticed that a lot of time is spent in .sampleMapFromData(), and more specifically this line:

samps <- colnames(experiments)

Replacing this line with

samps <- lapply(experiments, colnames)

leads to about a 30x gain in time (see example below).

Questions

  1. What do think of this change? I understand that the use of colnames() is more elegant, but it leads to a big overhead. Would you be willing to adopt it in MultiAssayExperiment?
  2. I have copied the function to QFeatures, but would rather prefer to call it from MultiAssayExperiment directly. Would you be willing to export this function as well so that we do not need to call to MultiAssayExperiment:::?

If you agree on this, I can send a PR to spare you time.

Example

## Improvement only available from very early devel of QFeatures
BiocManager::install("cvanderaa/QFeatures", ref = "issue104")

library(scpdata)
(qf <- specht2019v3()) ## a large `QFeatures` object

cd <- colData(qf)
el <- experiments(qf)
library(microbenchmark)
microbenchmark(MAE = MultiAssayExperiment:::.sampleMapFromData(cd, el),
               QF = QFeatures:::.sampleMapFromData(cd, el),
               times = 5)
Unit: milliseconds
 expr       min        lq      mean    median        uq       max neval
  MAE 708.17430 708.55089 711.39762 711.00339 712.72844 716.53106     5
   QF  22.17453  22.40136  23.51688  23.34302  24.18989  25.47558     5
LiNk-NY commented 2 years ago

Hi @cvanderaa The examples above do not give the same outputs. I can add a colnames,ExperimentList-method to make it faster. Best, Marcel