metamolecular / trey

V3000 CTfile tools
MIT License
2 stars 0 forks source link

Some atom block fields not supported? #2

Open schatzsc opened 1 year ago

schatzsc commented 1 year ago

Is the impression correct that not all atom block fields from the Biovia documentation are supported (yet)?

In atom.rs under pub struct Atom { ... } you define charge (=CHG), valence (=VAL), and mass (=MASS) but not the multiplicity (=RAD). It is admittedly used very rarely (if you have distinguish singlet and triplet carbenes for example, or singlet and triplet dioxygen) but still these will behave chemically different, although electron count and mass are the same. Deliberate omission?

Also unsure about the "*" "pseudoatom" and multiattachment (ENDPTS ) which however is very important for organometallics (in particular pi-complexes)

Could you comment?

rapodaca commented 1 year ago

Is the impression correct that not all atom block fields from the Biovia documentation are supported (yet)?

Yes, that's correct. It's a long way from feature complete.

Also unsure about the "*" "pseudoatom" and multiattachment (ENDPTS ) which however is very important for organometallics (in particular pi-complexes)

There's some more info on haptic bonds here.

Pseudoatoms crop up both by itself and with other features. It might seem as if top-level * atom is pointless, but consider this style of canonicalization (under "Anonymous Graph").

schatzsc commented 1 year ago

Well, as I said, RAD is possibly much less often used in comparison to CHG, MASS, and VAL

Interesting links relating to the 2nd question - but it is still unclear to me if TREY already supports the ENDPTS functionality?

Unless you want to use it for more generic definition of substituents, it is indeed more or less useless. However, if you want to convert multiattachment to individual bonds/edges it is needed to parse the ENDPTS line because unfortunately, it was not defined whether the first or the second node is the pseudoatom (and the other one the "real"), so can be either. Without the atom block with * pseudoatom defined as 11 you would not know whether the following line defines the two bonds (1-7) and (2-7) or (1-11) and (2-11) if using a TUCAN-style tuple notation:

M V30 9 1 **7** **11** ENDPTS=(2 1 2) ATTACH=ANY

If you want a test case, there is the Zeise's salt molfile that we use which features all field relevant to TUCAN and some "custom node attributes" in the end:

zeise_salt-multi-attachment.txt

schatzsc commented 1 year ago

And here is ferrocene:

ferrocene-multi-attachment.txt