`calculateMass`: clarify or enhance behaviour for charged formulas

rformassspectrometry / MetaboCoreUtils

Core utilities for metabolomics.

https://rformassspectrometry.github.io/MetaboCoreUtils/index.html

8 stars 6 forks source link

`calculateMass`: clarify or enhance behaviour for charged formulas #59

Open meowcat opened 1 year ago

meowcat commented 1 year ago

Currently calculateMass("[C6H12]+") == calculateMass("C6H12") Strictly, this should be calculateMass("[C6H12]+") == calculateMass("C6H12") - .emass where .emass == 0.0005485799 Questions:

is calculateMass("[C6H12]+") out-of-scope for this function, and should this throw a warning or error?
otherwise, should it give the correct result?
or, should there be an argument calculateMass(x, charge)?

Note: the rcdk equivalent is get.formula(formula, charge)@mass . Notably, get.formula("[C6H6]+") also does not account for the charge (but creates the neutral formula). However here the user can specify the charge manually.

jorainer commented 1 year ago

@michaelwitting

michaelwitting commented 1 year ago

Not out of scope. We can add changes for this. In my opion charged formulas should be supplied exactly in the format you mentioned, e.g. [C6H12]+. Multiple charges would be 2+, 3+ or 2-. @jorainer would that be okay as input?

jorainer commented 1 year ago

IMHO it would be best if calculateMass would identify the charge automatically and return the correct mass.

maybe best to have one function

charge that takes formulas (character) and returns their charges (integer) - (function name might not be ideal though).
maybe another function that strips charges from the formulas (stripCharge)? - and along that line, maybe also one that adds charges to formulas (so that we can ensure that they have the correct format)

happy for a PR :)

michaelwitting commented 1 year ago

I can fiddle around a bit, but what do we want as input?

single string containing everything, e.g. [C6H12]+
two parameters for calculateMass, e.g. calculateMass("C6H12", charge = "1+")

I would prefer the first option

jorainer commented 1 year ago

I would also say the first option (i.e. "[C6H12]+") - ideally also supporting a character vector of several formulas and not just a single one.

michaelwitting commented 1 year ago

The way I think it through now, calculateMass() would in future only accept character and no preparsed lists. Except we put the parsing of charges in the countElements() function

jorainer commented 1 year ago

we also have isotope detection/parsing in there (calculateMass? countElements?), right? maybe would make most sense to do that in the same function?

jorainer commented 1 year ago

don't know if the previous comment made sense... what I mean: put that into the same function that does already the parsing to avoid needing to parse the input twice (once e.g. counting elements and once getting the charge). but this will also depend a bit how complicated it would be - if it's too difficult or blows up the code we could do it in two separate steps (i.e. a dedicated function that retrieves the charge from a character). maybe it would even be good to just focus first on the latter, to have a clean (separate) implementation to derive the charge from a character so that we can test and then later improve?