rformassspectrometry / CompoundDb

Creating and using (chemical) compound databases
https://rformassspectrometry.github.io/CompoundDb/index.html
17 stars 16 forks source link

mass2mz method for CompDb #99

Open jorainer opened 2 years ago

jorainer commented 2 years ago

Add a mass2mz method for CompDb that calculates m/z values on formulas in the CompDb (supporting also filters etc).

jorainer commented 2 years ago

@RogerGinBer are you currently working on this? That would be totally fine for me - contribution in form of a PR (pull request) highly welcome

RogerGinBer commented 2 years ago

Yes, I'm working on it :+1: I believe we should also have a mass2mz method (or perhaps a different function, like formula2mz), that, given a list of formulas and adducts, calculates each formula's neutral mass and then calls mass2mz to generate a formula-adduct mz matrix. This way, the CompDb method just has to extract the formulas from the object and call formula2mz. Would that make sense?

jorainer commented 2 years ago

The CompDb provides already masses (monoisotopic), so it would be a little computational overhead to first calculate the masses from the chemical formulas. So I would maybe start with a mass2mz first.

Actually, a formula2mz might be a nice addition for MetaboCoreUtils - it could simply combine the MetaboCoreUtils::calculateMass function (that can also calculate masses from e.g. "[13C3]C3H12O6") and then calculate m/z using the MetaboCoreUtils::mass2mz. If interested you could do a PR with that function in MetaboCoreUtils?

RogerGinBer commented 2 years ago

Sure thing! I read in the create-compounddb vignette that the exact_mass column could have NA values, so that's why I thought of doing it from the formula. But yes, makes more sense to use the mass directly, when available

Yes, I'll open an issue at MetaboCoreUtils and start a PR on that

RogerGinBer commented 2 years ago

Also, I've found some unexpected behavior with the compounds accessor using the HMDB example CompDb: compounds(cmp_db, "exactmass") gives only 8 results (removes one duplicate 104.0473), while compounds(cmp_db)$exactmass returns them all 9.

Is this how it's supposed to work?

jorainer commented 2 years ago

What you observed/described above is the default behaviour of the compounds function: it uses by default distinct in the SQL call, thus returning only unique results. To ensure all formulas are returned (even duplicated ones) I would suggest to use compounds(cmp_db, c("compound_id", "exactmass")).