rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
37 stars 25 forks source link

Add containsMz and containsNeutralLoss #98

Closed jorainer closed 4 years ago

jorainer commented 4 years ago

This PR:

jorainer commented 4 years ago

Some implementation notes: both new methods will use parallel processing depending on dataStorage of the Spectra object. The main reason for this is less the performance gain than the much lower memory demand which will be crucial for large experiments with on-disk backends. For these cases, m/z values will always be loaded just for n files at a time with n being equal to the number of parallel processes.

If parallel processing is disables (with SerialParam()) the methods don't use the rather costly splitting and unsplitting of the object.

Performance comparison:

library(Spectra)
sciex_file <- normalizePath(
    dir(system.file("sciex", package = "msdata"), full.names = TRUE))
sciex_mzr <- backendInitialize(MsBackendMzR(), files = sciex_file)

sps_mzr <- Spectra(sciex_mzr)
sps_df <- setBackend(sps_mzr, MsBackendDataFrame())
mzs <- c(123.3, 432.4)

library(microbenchmark)
microbenchmark(
    containsMz(sps_mzr, mz = mzs, BPPARAM = SerialParam()),
    containsMz(sps_mzr, mz = mzs, BPPARAM = MulticoreParam(2)),
    containsMz(sps_df, mz = mzs, BPPARAM = SerialParam()),
    containsMz(sps_df, mz = mzs, BPPARAM = MulticoreParam(2)),
    times = 5)
Unit: milliseconds
                                                       expr      min       lq
     containsMz(sps_mzr, mz = mzs, BPPARAM = SerialParam()) 854.5136 912.4987
 containsMz(sps_mzr, mz = mzs, BPPARAM = MulticoreParam(2)) 679.5463 744.8684
      containsMz(sps_df, mz = mzs, BPPARAM = SerialParam())  64.0799  77.1298
  containsMz(sps_df, mz = mzs, BPPARAM = MulticoreParam(2))  77.0310  77.8466
      mean   median       uq      max neval cld
 921.63302 921.9385 935.9784 983.2359     5   c
 778.13466 765.4368 789.6353 911.1865     5  b 
  89.94480  77.3397 102.4513 128.7233     5 a  
  84.96416  80.7985  92.0666  97.0781     5 a