rformassspectrometry / MetaboCoreUtils

Core utilities for metabolomics.
https://rformassspectrometry.github.io/MetaboCoreUtils/index.html
8 stars 6 forks source link

`calculateMass`: clarify or enhance behaviour for charged formulas #59

Open meowcat opened 1 year ago

meowcat commented 1 year ago

Currently calculateMass("[C6H12]+") == calculateMass("C6H12") Strictly, this should be calculateMass("[C6H12]+") == calculateMass("C6H12") - .emass where .emass == 0.0005485799 Questions:

Note: the rcdk equivalent is get.formula(formula, charge)@mass . Notably, get.formula("[C6H6]+") also does not account for the charge (but creates the neutral formula). However here the user can specify the charge manually.

jorainer commented 1 year ago

@michaelwitting

michaelwitting commented 1 year ago

Not out of scope. We can add changes for this. In my opion charged formulas should be supplied exactly in the format you mentioned, e.g. [C6H12]+. Multiple charges would be 2+, 3+ or 2-. @jorainer would that be okay as input?

jorainer commented 1 year ago

IMHO it would be best if calculateMass would identify the charge automatically and return the correct mass.

maybe best to have one function

happy for a PR :)

michaelwitting commented 1 year ago

I can fiddle around a bit, but what do we want as input?

I would prefer the first option

jorainer commented 1 year ago

I would also say the first option (i.e. "[C6H12]+") - ideally also supporting a character vector of several formulas and not just a single one.

michaelwitting commented 1 year ago

The way I think it through now, calculateMass() would in future only accept character and no preparsed lists. Except we put the parsing of charges in the countElements() function

jorainer commented 1 year ago

we also have isotope detection/parsing in there (calculateMass? countElements?), right? maybe would make most sense to do that in the same function?

jorainer commented 1 year ago

don't know if the previous comment made sense... what I mean: put that into the same function that does already the parsing to avoid needing to parse the input twice (once e.g. counting elements and once getting the charge). but this will also depend a bit how complicated it would be - if it's too difficult or blows up the code we could do it in two separate steps (i.e. a dedicated function that retrieves the charge from a character). maybe it would even be good to just focus first on the latter, to have a clean (separate) implementation to derive the charge from a character so that we can test and then later improve?