Laurent and I are maintaining the QFeatures class that directly inherits from the MultiAssayExperiement class.
I have been profiling the QFeatures code for adding new assays to the QFeatures. Our software is currently a bit slow for large single-cell proteomics datasets, and these will soon become larger... I have noticed that a lot of time is spent in .sampleMapFromData(), and more specifically this line:
samps <- colnames(experiments)
Replacing this line with
samps <- lapply(experiments, colnames)
leads to about a 30x gain in time (see example below).
Questions
What do think of this change? I understand that the use of colnames() is more elegant, but it leads to a big overhead. Would you be willing to adopt it in MultiAssayExperiment?
I have copied the function to QFeatures, but would rather prefer to call it from MultiAssayExperiment directly. Would you be willing to export this function as well so that we do not need to call to MultiAssayExperiment:::?
If you agree on this, I can send a PR to spare you time.
Example
## Improvement only available from very early devel of QFeatures
BiocManager::install("cvanderaa/QFeatures", ref = "issue104")
library(scpdata)
(qf <- specht2019v3()) ## a large `QFeatures` object
cd <- colData(qf)
el <- experiments(qf)
library(microbenchmark)
microbenchmark(MAE = MultiAssayExperiment:::.sampleMapFromData(cd, el),
QF = QFeatures:::.sampleMapFromData(cd, el),
times = 5)
Unit: milliseconds
expr min lq mean median uq max neval
MAE 708.17430 708.55089 711.39762 711.00339 712.72844 716.53106 5
QF 22.17453 22.40136 23.51688 23.34302 24.18989 25.47558 5
Hi Marcel, @LiNk-NY
Context
Laurent and I are maintaining the
QFeatures
class that directly inherits from theMultiAssayExperiement
class.I have been profiling the
QFeatures
code for adding new assays to theQFeatures
. Our software is currently a bit slow for large single-cell proteomics datasets, and these will soon become larger... I have noticed that a lot of time is spent in.sampleMapFromData()
, and more specifically this line:Replacing this line with
leads to about a 30x gain in time (see example below).
Questions
colnames()
is more elegant, but it leads to a big overhead. Would you be willing to adopt it inMultiAssayExperiment
?QFeatures
, but would rather prefer to call it fromMultiAssayExperiment
directly. Would you be willing to export this function as well so that we do not need to call toMultiAssayExperiment:::
?If you agree on this, I can send a PR to spare you time.
Example