openMetadataInitiative / openMINDS_controlledTerms

Metadata model for the consistent registration of well-defined terms as well as a corresponding library of terminologies (including links to ontological terms where applicable).
MIT License
7 stars 12 forks source link

Molecular Entities needed for in depth #458

Closed Peyman-N closed 8 months ago

apdavison commented 8 months ago

Do we really want IUPAC names in all cases? For example, "sucrose" is a much more useful name than "β-D-fructofuranosyl α-D-glucopyranoside"

Since we anyway list synonyms, and provide ontology links where possible, I think we should use the name that most scientists would recognize as the "name" property and in the @id, so "DNQX" not "6,7-dinitroquinoxaline-2,3-dione"

On further investigation, using IUPAC names in @ids will often be problematic because the names contain characters that cause problems in IRIs (need to be encoded, systems may but are not required to accept them as valid IRIs), such as square brackets, e.g.

"magnesium;[[[(2R,3S,4R,5R)-5-(6-aminopurin-9-yl)-3,4-dihydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-oxidophosphoryl] hydrogen phosphate" (MgATP)

lzehl commented 8 months ago

@apdavison we need to discuss (I see the issue with IUPAC as well). However using trivial names causes issues as well because not every component has one and we end up being very inconsistent within the terminology. I would like to have something consistent as name / id and then put everything else in the synonyms

apdavison commented 8 months ago

This is a nice comparison of the major public chemistry databases: https://doi.org/10.1002/cmdc.201700724

As a naming convention, I suggest picking one of them, let's say PubChem, since it has a better search interface than UniChem, ChemSpider is apparently not fully public, and ChEBI is missing a lot of terms.

For the first few chemicals in this PR, this would give the following (PubChem name, followed by current name in this PR in parentheses):

lzehl commented 8 months ago

@apdavison thank you for doing some research to solve this. I think your suggestion is great. Let's go with PubChem. We should check if we could get in contact with them long-term (to see if we can collaborate/contribute directly)

lzehl commented 8 months ago

@Peyman-N all updates done or are you still on it? I'm meeting with @tgbugs now and will mention our decision. Hope he agrees.

lzehl commented 8 months ago

@Peyman-N @apdavison feedback from @tgbugs : they are using ChEBI in the ontology (NIF/InterLex) but PubChem is good as well. If terms are missing from ChEBI @tgbugs would be interested to know which terms these are. @apdavison do you have examples?

apdavison commented 8 months ago

do you have examples?

none to hand. This comment is based on the review article I mentioned above, and on ChEBI having less comprehensive lists of synonyms (for example, "DNQX" gives no hits in ChEBI, but if you try search for it in PubChem then use the synonyms you find there to search ChEBI you can find it).

Peyman-N commented 8 months ago

Hello, everyone. Sorry for the late response.

When it comes to small molecules, ChEBI is comprehensive enough for our purposes. The only small molecule that I couldn't find is Silver ammonium. However, for larger molecules, it becomes less extensive, especially for drugs and proteins.

Anyway, I really prefer ChEBI as it is more widely used by the community. On the other hand, PubChem anchors to multiple ontologies, which is nice. However, I think we should discuss the possibility of anchoring to multiple ontologies in the future.

lzehl commented 8 months ago

@Peyman-N @apdavison what does this mean now? you would like to connect use the ChEBI name when available and to PubChem when it is not available in ChEBI? OR do we stick to original suggestion to only use the PubChem name? (@Peyman-N connecting to multiple ontologies is handled by InterLex, I would not like to take on this task as well)

Peyman-N commented 8 months ago

No no, sorry I didn't explain myself correctly. I mean for naming convention we would use PubChem. I was speaking about prefer ontologie, any way the interlex and PubChem, would take care of the linkage there.

lzehl commented 8 months ago

got it @Peyman-N thanks for clarifying!

apdavison commented 8 months ago

@lzehl can we merge this?

lzehl commented 8 months ago

@apdavison yes please. one approval should be enough. We do not need to have two people approving for instances (that were discussed beforehand) and smaller issues I would say.

apdavison commented 8 months ago

@lzehl I agree in general, but since this PR also has the goal of establishing conventions I think we should all approve it.

lzehl commented 8 months ago

@apdavison as we approved / agreed on a convention in the discussion I merged it now already because I thought this one is urgent. I can go over the instances maybe some time this week in detail and do corrections if needed.