Closed jorainer closed 4 years ago
Some implementation notes: both new methods will use parallel processing depending on dataStorage
of the Spectra
object. The main reason for this is less the performance gain than the much lower memory demand which will be crucial for large experiments with on-disk backends. For these cases, m/z values will always be loaded just for n files at a time with n being equal to the number of parallel processes.
If parallel processing is disables (with SerialParam()
) the methods don't use the rather costly splitting and unsplitting of the object.
Performance comparison:
library(Spectra)
sciex_file <- normalizePath(
dir(system.file("sciex", package = "msdata"), full.names = TRUE))
sciex_mzr <- backendInitialize(MsBackendMzR(), files = sciex_file)
sps_mzr <- Spectra(sciex_mzr)
sps_df <- setBackend(sps_mzr, MsBackendDataFrame())
mzs <- c(123.3, 432.4)
library(microbenchmark)
microbenchmark(
containsMz(sps_mzr, mz = mzs, BPPARAM = SerialParam()),
containsMz(sps_mzr, mz = mzs, BPPARAM = MulticoreParam(2)),
containsMz(sps_df, mz = mzs, BPPARAM = SerialParam()),
containsMz(sps_df, mz = mzs, BPPARAM = MulticoreParam(2)),
times = 5)
Unit: milliseconds
expr min lq
containsMz(sps_mzr, mz = mzs, BPPARAM = SerialParam()) 854.5136 912.4987
containsMz(sps_mzr, mz = mzs, BPPARAM = MulticoreParam(2)) 679.5463 744.8684
containsMz(sps_df, mz = mzs, BPPARAM = SerialParam()) 64.0799 77.1298
containsMz(sps_df, mz = mzs, BPPARAM = MulticoreParam(2)) 77.0310 77.8466
mean median uq max neval cld
921.63302 921.9385 935.9784 983.2359 5 c
778.13466 765.4368 789.6353 911.1865 5 b
89.94480 77.3397 102.4513 128.7233 5 a
84.96416 80.7985 92.0666 97.0781 5 a
This PR:
rhdf5
.containsMz
andcontainsNeutralLoss
methods (issue #96)