rformassspectrometry / CompoundDb

Creating and using (chemical) compound databases
https://rformassspectrometry.github.io/CompoundDb/index.html
17 stars 16 forks source link

Implement a StandardsDb that extends CompoundDb #77

Closed jorainer closed 2 years ago

jorainer commented 3 years ago

Add a StandardsDb object/database layout that extends the CompDb by adding an ion table which contains retention times and adducts of lab-standards (for a certain setup or matrix).

an adducts (or ions) method would then return this information, joining the ion table with the compound table.

jorainer commented 3 years ago

StandardsDb (or IonDb?) should extend CompDb. Things to implement would be:

jorainer commented 3 years ago

Proposed workflow to create/add info to an IonDb would be:

1) create a CompDb e.g. from HMDB. 2) add ions measured on a certain LC-MS setup to that converting it from CompDb to IonDb. 3) support adding additional ion information later to the same IonDb. 4) support adding MS/MS spectra to the IonDb and link them to the ion.

Some properties:

jorainer commented 3 years ago

Most of it is implemented. What remains is how to add MS/MS spectra and how to link them to the ion - and/or the compound?

@andreavicini , can you maybe also start thinking how to best solve that? Spectra in the msms_spectra are linked to the ms_compounds table via the compound_id - and we have also ions linked to compounds with the same ID. But for a direct link between ions and spectra we might need an additional table mapping ion IDs to spectrum IDs...

andreavicini commented 3 years ago

I realized that I’m not sure I completely understood so I wanted to ask a few clarifications. Should we implement the possibility to add more MS/MS spectra in addition to those that are added when creating a ionDb? Why do we want to create a direct link between ions and spectra without indirectly passing from compound_id? Is that because only certain adducts of a given compound have to be linked to a given spectrum? In that case the additional table has to be provided by the user, right?

jorainer commented 3 years ago

You're right. Maybe that makes it all just too complicated and in the end unusable. Let's just add the possibility to add MS/MS spectra and link them to compound_id. Then this functionality would be even better to have just for CompDb (IonDb would anyway inherit it then). The definition of the method would then be:

setMethod("insertSpectra", signature(object = "CompDb", spectra = "Spectra"), function(object, spectra, ...) {
})
andreavicini commented 3 years ago

I've just rembered, since a CompDb is read only would it still work to have the insertSpectra defined for it and not only for ionDb?

jorainer commented 3 years ago

I think we should/could have it for both - if the database is read-only an error message would be shown. Otherwise it should work.

andreavicini commented 3 years ago

Hi, I still have some doubts regarding inserting a spectra in a database object and in particular updating the msms_spectrum table of the database because the names for certain variables are not the same in the table and in the spectra object (e.g. "collision_energy" and "collisionEnergy"). If this two variables are always called that way in the two objects (is that the case?) I could use that correspondence in the function (that's what I did for now). I also found “msLevel", "ms_level" and ”precursorMz", "precursor_mz" (are there some others?). Or should we do in another way?

jorainer commented 3 years ago

Ah yes. So, the mapping for spectraVariables to database columns would be:

I'm however not sure if all of the above are mandatory columns - I would first check which columns you have in the database table and then only get the values for these from the Spectra object. Also, be aware that "collision_energy" is a text field in the database, but that the collisionEnergy spectra variable is numeric. So maybe you need to do an as.character on that...

Note: when inserting data from a Spectra into the database, I would (for now) only insert the fields (spectra variables) for which you already have a column in the database table...

andreavicini commented 3 years ago

Thank you! I noticed that a example database in the package has both fields collision_energy (but it contains only NA) and collision_energy_text. Why do we have these two different fields? Lastly, should I require that the input spectra has the variable "compound_id", right?

jorainer commented 3 years ago

The problem with the collision energy is that the spectra variable collisionEnergy does only accept numeric values, while sometimes the collision energy is provided as a character (e.g. "from 10ev to 20ev)") - to manage that I added a database column "_text" to contain the information as a character info. But this should only be an issue when getting the data from the database. If you have a Spectra object you can only have this data as a numeric. What I would suggest is the following:

Check if the database table has a column called "collision_energy_text". If so, check if the Spectra has a spectra variable "collision_energy_text". If that's the case your're fine. If Spectra does not have a "collision_energy_text" variable create one filling it with x$collision_energy_text <- as.character(collisionEnergy(x)).

If the database does not have this column all should be fine anyway - you would only need to map/rename the spectra variable "collisionEnergy" to "collision_energy" (but also only if the database contains a "collision_energy" column.

And yes, the Spectra needs to have a variable "compound_id" and all of the values in that variable need to match a compound id in the "compound_id" column in the database (can't remember how the compound table is called - was it ms_compound?).