rapodaca / dialect

Documenting a subset of the SMILES language.
MIT License
13 stars 0 forks source link

Subset of SMILES #44

Closed rapodaca closed 2 years ago

rapodaca commented 2 years ago

The ms makes several references to "dialect" in the linguistic sense and this is of course the code name for the language. But the goal is not to make yet another dialect. The goal is to for the first time fully define a language that functions as a subset of SMILES-as-practiced. No extensions. No pet nice-to-haves. But a subset to the extent it's possible without internal inconsistencies.

Parts to improve:

  1. Title
  2. Line 37. The introduction leads to this point, so this paragraph must concisely spell out the aim. It doesn't quite hit the mark. It uses the "dialect" idea, for one.
  3. Discussion section. Line 497 brings in SMILES. This is a good place to re-introduce the previous two papers on SMILES. Compare what's in them (and not in them) with Dialect.
rapodaca commented 2 years ago

Points of incompatibility. These are features explicitly defined in one or the other paper, but which are different in Dialect:

Points not explained by either paper, but in Dialect:

Contradictory points resolved in Dialect:

rapodaca commented 2 years ago

Table of differences:

Feature SMILES Dialect
element symbol Ha accepts rejects
element symbols Db; Sg; Bh; Hs; Mt; Rg; Cn; Nh; Fl; Mc; Lv; Ts; and Og rejects accepts
future element symbols approved by IUPAC rejects accepts
comma symbol (,) may accept rejects
multiple branching e.g., *((*))*) accepts rejects
reactions using greater than symbol (>) accepts rejects
extended stereodescriptors e.g., @@@, @@@@, @AL1, @1, and @SP1 accepts rejects
use of stereodescriptors on odd cumulene centers accepts rejects
virtual hydrogen count on hydrogen probably rejects accepts
detachments are bonds of "formal order zero" probably accepts rejects
upper and lower bounds on atomic properties rejects accepts
nitrogen default valence includes 5 partially accepts accepts
unbracketed hydrogen atom partially accepts rejects
acyclic atom selection partially accepts accepts