rformassspectrometry / MetaboCoreUtils

Core utilities for metabolomics.
https://rformassspectrometry.github.io/MetaboCoreUtils/index.html
7 stars 6 forks source link

Use a data.frame of adduct definitions instead of a list of lists #23

Closed jorainer closed 2 years ago

jorainer commented 4 years ago

The current mass2mz implementation suffers a little bit on the way adducts are defined and returned. They are currently stored in a list of lists which is generated and calculated on-the-fly each time. I suggest we use a pre-build data frame instead that is created during package loading time. This can improve the speed. Note that this speed improvement will be crucial if the function is applied repeatedly (e.g. in for loops).

The benchmark for this:

library(MetaboCoreUtils)

mass2mz2 <- function(x, adduct = "[M+H]+") {
    if(!adduct %in% rownames(MetaboCoreUtils:::.ADDUCTS))
        stop("Unknown adduct: ", adduct)
    tmp <- MetaboCoreUtils:::.ADDUCTS[adduct, , drop = FALSE]
    mass_multi <- tmp$mass_multi
    mass_add <- tmp$mass_add
    x * mass_multi + mass_add
}

mass2mz3 <- function(x, adduct = "[M+H]+") {
    if(!adduct %in% names(MetaboCoreUtils:::.ADDUCTS_MULT))
        stop("Unknown adduct: ", adduct)
    mass_multi <- MetaboCoreUtils:::.ADDUCTS_MULT[adduct]
    mass_add <- MetaboCoreUtils:::.ADDUCTS_ADD[adduct]
    x * mass_multi + mass_add
}

library(microbenchmark)
microbenchmark(mass2mz(4),
               mass2mz2(4),
               mass2mz3(4))

Unit: microseconds
        expr    min      lq     mean  median     uq     max neval cld
  mass2mz(4) 35.172 37.1630 42.20451 39.8285 42.419 116.896   100  b 
 mass2mz2(4) 62.598 64.9605 71.57665 67.1245 69.541 150.860   100   c
 mass2mz3(4) 11.163 11.8870 14.58190 13.5700 14.503  64.132   100 a  
michaelwitting commented 4 years ago

Totally agree. Did I send you already a table? I don't know yet. I generated it with rcdk. We can include the script in inst/scripts.

michaelwitting commented 4 years ago

We should also leave the possibility for users to define their own data.frame. By default the methods can use ours, but users can overwrite this. What do you think?

jorainer commented 4 years ago

Excellent suggestions (both of it). What would be the requirements for the user-defined table? just name, multiplicative factor and additive factor?

michaelwitting commented 4 years ago

Oh, I just saw you have it already in inst. I was just about to add it to my devel branch. I would suggest to have also the formula parts, because then you could calculate also an ion formula.

jorainer commented 3 years ago

@michaelwitting , sorry, lost totally the overview here - do you have some additional adduct definitions we could add or the formula parts that you mentioned above?

Also @stanstrup , could you please have a look if you could add some of the adducts you provide with the https://github.com/stanstrup/commonMZ package?

michaelwitting commented 3 years ago

Happy with the adducts currently covered!