open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
93 stars 26 forks source link

Relations among compounds #748

Open qai222 opened 1 month ago

qai222 commented 1 month ago

Is your feature request related to a problem? Please describe. It is somewhat difficult to describe relations among compounds, e.g. X is enantiomeric to/diastereomeric to/derivative of Y. Some of these relations seem important as they are used to defined measurements (e.g. selectivity).

Describe the solution you'd like An unambiguous way to describe relations among compounds.

Describe alternatives you've considered If compounds have unique identifiers (can different Compound have duplicate CompoundIdentifier?) then we can

  1. record relations explicitly as triples, or
  2. add CompoundRelation and a compound_relations field to Compound and ProductCompound.
    Compound
    ...
    compound_relations
        type: is_derivative_product_of   # can be an enum
        value: <compound identifier>

This could help queries. E.g. what could be used to dissolve reagent X? The simple way is to find compounds with reaction_role==solvent for all reactions including X. But there are cases where X was first dissolved in Y to form a ReactionInput and Y is not the solvent for the reaction (like amalgam). If we have relations like X is_directly_dissolved_in Y then we can query against the relation directly. This could also give better solubility estimates compared to "the simple way" where solvent quantity is recorded for the entire reaction mixture.