open-reaction-database / ord-schema

Schema for the Open Reaction Database
https://open-reaction-database.org
Apache License 2.0
92 stars 26 forks source link

Support the definition of active species/complexes? #524

Open connorcoley opened 3 years ago

connorcoley commented 3 years ago

It is not uncommon for multiple inputs/components to result in the formation of an active catalyst complex in a reaction. It could be useful to allow users to specify what this complex is (e.g., by MolFile) and any associated computed compound features. This could be useful for machine learning on catalyst/ligand structures down the road without having to infer what the active species is based on multiple inputs.

It is not clear where these additional fields should go.

skearnes commented 3 years ago

Has this been resolved?

connorcoley commented 3 years ago

It has not been resolved, no.

This is information that doesn't make sense to define at the input-level, since an active complex can be formed from multiple inputs. Defining active complexes / intermediate Compounds is one use for a reaction-level annotation. Another could be an estimate of a derived kinetic rate constant or energetic barrier, which would need some generic Data field.

These are thematically different from what is currently in the ReactionNotes message, but right now that's my top choice. I don't know if we'd want to add a catch-all for other reaction-related annotations there or add structured fields, e.g., repeated Compound intermediates or repeated Compound active_species

I think this is a low/medium-priority task that will require more thought