rformassspectrometry / CompoundDb

Creating and using (chemical) compound databases
https://rformassspectrometry.github.io/CompoundDb/index.html
17 stars 16 forks source link

custom Db - spectra and ion Db questions #96

Open cbroeckl opened 2 years ago

cbroeckl commented 2 years ago

Hello developers - excellent work on this collection of packages. this is exciting, and i am eager to start implementing them in my workflows! I can see applications that you have outlined in the vignette(s), but i can also see applications where i have a specific custom set of structures/spectra that are generated completely in-house. Do you have any documents describing how to create a small custom structure and/or spectra Db from scratch?

i.e. lets say i have a 100 compound mixture i use regularly, and i want to build a Db containing only that. I know the compounds, so i have inchi/key, smiles, names, some accession number of some sort, is the best approach to build my own tibble with all the correct headers, to create a compound database? From the vignette, it would seem that any tibble with the correct headers would suffice. I think this step is readily adapted to a custom database from a table format.

I then want to append MS/MS spectra to supplement the structures, is there a guide for importing these spectra and attaching? The examples all refer to preformatted spectra from various databases - does .msp have an importer? If i have a spectrum from an R processing session (i.e. from parsing raw mzML data using the Spectra::Spectra function) can i attached it manually?

I have ions/adducts/fragments, is there a standard nomenclature to be used for adducts? [M+H]+ is pretty standard, but what about a neutral loss from that, lets say two waters fall off. This could be expresseed as [M+H-H2O-H2O]+, or [M+H-2(H2O)]+, or [M+H-H4O2]+, or [M-H3O2]+... do you have any guidance on (in)compatibilities or best practices?

Also, do you have a compatible format for isotopically labelled compounds in the compound Db and/or Ion Db?

jorainer commented 2 years ago

yes, you can also create a database completely from skratch with a single data.frame (or tibble) - you can find a description of the required columns in the help of the createCompDb function.

And yes, it should be possible to add MS2 spectra from a Spectra object - see help of the insertSpectra function. Let me know if something is not working with this one. Importing data from Msp files into a Spectra should also work pretty well now with the MsBackendMsp package.

For the adduct nomenclature - for your database you can use whatever you prefer. Maybe also have a look at MetaboCoreUtils::adductNames() to see what we have available and use that nomenclature for those overlapping.

Regarding isotopically labelled compounds, I guess you could add them to the compound table providing their specific exact mass? Note that you can also add any additional columns to that table when you create the CompDb database. The ones listed in the documentation above are mandatory, but you can add as many additional ones as you like (and the mandatory ones can also contain missing values).

As always, please let me know if you run into any problems or if something is not working as expected and I will adapt. It's hard to foresee all possible use cases, but the system should be flexible enough so that we can adapt it.

jorainer commented 2 years ago

I've just updated the package with PR #97 . Creating custom CompDb databases should now be a little easier. I've also added a new section to the respective vignette.