rapodaca / dialect

Documenting a subset of the SMILES language.
MIT License
13 stars 0 forks source link

Please don't forget about metals #1

Closed schatzsc closed 3 years ago

schatzsc commented 3 years ago

Nice idea - but please don't forget about metals, as your VB graph part will likely fail for most of the transition metals as well as some main group elements such as boron, valence bond theory does not work for the highly delocalized bonding situations you often encounter there, such as 2c3e bonds in diborane B2H6, iron nitrosyl complexes, [Fe-S] clusters, ... Also, limiting the hydrogen count to 0..9 will exclude some compounds, e.g. [Zr(BH4)4] as well as the Hf analogue, which have 12 hydrogens coordinated to the metal center, 3 from each of the four borohydrides (the 4th H points away from the metal due to its tetrahedral structure): https://link.springer.com/content/pdf/10.1007/BF00962359.pdf Same issue with implicit hydrogens: "a free nitrogen atom can bind three or five hydrogens" - and what about molecules as simple as [NH4]+??? Bond order in metal complexes can go up to 6 due to involvement of d orbitals: https://en.wikipedia.org/wiki/Sextuple_bond Want more input???

rapodaca commented 3 years ago

Nice idea - but please don't forget about metals, as your VB graph part will likely fail for most of the transition metals as well as some main group elements such as boron, valence bond theory does not work for the highly delocalized bonding situations you often encounter there, such as 2c3e bonds in diborane B2H6, iron nitrosyl complexes, [Fe-S] clusters, ...

I agree with most of this. Dialect is based on the VB model and that fails for many organometallics. The bad news is that this means the most interesting organometallics will not be representable by Dialect. The good news is that by clearly stating the underlying model, failures modes will be explicit.

In other words, if your molecule fits into the VB model Dialect will encode/decode it with zero loss of information. Otherwise, try another method. The second bit of "good" news is that no format in wide use supports extended bonding, so this limitation will remain theoretical until that changes.

If you have ideas for how to retain the notational brevity that follows from the VB model while supporting extended bonding, I'd be interested.

Also, limiting the hydrogen count to 0..9 will exclude some compounds, e.g. [Zr(BH4)4] as well as the Hf analogue, which have 12 hydrogens coordinated to the metal center, 3 from each of the four borohydrides (the 4th H points away from the metal due to its tetrahedral structure):

Can't view the structure because it's behind paywall. You can add images to your message.

Same issue with implicit hydrogens: "a free nitrogen atom can bind three or five hydrogens" - and what about molecules as simple as [NH4]+???

Ammonium would be [NH4+]. The brackets mean "virtual hydrogen count" and so implicit hydrogen count isn't computed. Examples like this will be included in a section dedicated exclusively to implicit hydrogen counting.

Bond order in metal complexes can go up to 6 due to involvement of d orbitals:

It looks very unlikely that Dialect will be compatible with metal complexes having that kind of structure.

Want more input???

Yes!

rapodaca commented 3 years ago

Closing this unless a way to support extended bonding without introducing complexity in semantics or syntax comes too light.