Open electronicsbyjulie opened 2 years ago
This works now:
mol_assem_test 'CC(C)c1c(CC)c(CC)c(C)c(CC(O)CCC)c1C(C)C'
c1ccccc1 ✅
Cc1c(O)c(C)c(O)c(C(=O)O)c1O ✅
c1cc[o+](C)c1 ❌
c1c2ccccc2ccc1 ❌
Cc1c2ccc(O)cc2ccc1 ❌
test/mol_assem_test 'c1ccccc1N=Nc2ccccc2'
❌
https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system#Aromaticity
Aromatic nitrogen bonded to hydrogen, as found in pyrrole must be represented as [nH]; thus imidazole is written in SMILES notation as n1c[nH]cc1.
Current code is incorrect; it adds the hydrogen if not specified, so e.g. n1cncc1
is the same as n1c[nH]cc1
when in fact n1cncc1
should be the same as n1c[n]cc1
.
c1ccccc1 ❌
Cc1c(O)c(C)c(O)c(C(=O)O)c1O ❌
c1cc[o+](C)c1 ❌
c1c2ccccc2ccc1 ❌
Cc1c2ccc(O)cc2ccc1 ❌
test/mol_assem_test 'c1ccccc1N=Nc2ccccc2' ❌
n1cncc1 ❌
EVERY TEST is curently failing.
The most fundamental molecular geometry and can't even get it close to right.
Even that more primitive tests, e.g points and atom?
Oh, sorry... every test in the original post was failing.
I managed to fix aromatic rings (i.e. alternating single and double bonds). Substituted monocyclic benzenes are working now, which is huge, since they're a large class of molecules, many of which are important to olfaction.
There is a function Molecule::correct_structure() designed to look for any flaws in the structure - incorrect bond lengths, incorrect angles, etc - and fix them. It works great, except for rings. Any time it is given a path that loops back on itself, the function goes terribly wrong.
I've resorted to the kludge of calling Molecule::make_coplanar_ring() for aromatic rings. Doesn't fix non-aromatic rings, e.g. proline (P in the amino and protein tests). Molecule::close_loop() should be able to handle those but it isn't working.
I think I will make an automated test for the list of SMILES inputs. The correct_structure() function should be able to evaluate the structure even if it doesn't succeed at repairing it.
Will resume in ~9 hours.
I am so tired of seeing some of the structures continue to look wrong time and again every attempt. There should be one algorithm that finds the best configuration for any structural formula.
Someone already found it. Why reinvent the wheel when 99% of use cases would have sudo access.
Alright, I don't have to feel so bad...
test/mol_assem_test 'C1(C=CC=C1)=C2C=C2' test/calicene.sdf
Even obabel gets this one wrong, though much less wrong than POdock code.
Nice find!
Thank you!
Cyclohexane has broken, as have several of the others that were working before (total substitution, etc).
Began fixing saturated rings for #263 but have to shelve it for the moment. Leaving a comment // issue_5
in molecule.cpp
to know where to resume progress later.
Results of the
test/mol_assem_tests.sh
test:benzene 🟢 toluene (ring last) 🟢 toluene (ring first) 🟢 cyclohexane 🟢 cyclopentane 🟢 cyclobutane 🟢 cyclobutene 🟡 (hydrogenation.) cyclobutadiene 🟡 (hydrogenation.) cyclopropane 🟢 cyclopropene ❌ (hydrogenation.) calicene ❌ (first ring fails to close.) 2,6-xylenol 🟢 proline 🟡 (hydrogenation) total substitution test 🟢 o-methylfuranium 🟡 (distorted.) naphthalene ❌ diphenyl ether ❌ (Ar-O angles.) azobenzene ❌ (double bond should be planar.) leaf alcohol/cis-3-hexen-1-ol 🟢 imidazole 🟡 (N-H bond should be coplanar.) glucose ❌ (the last carbon is in the wrong chirality, even if a [@] symbol is added.) tetrathioglucose 🟡 (distorted.) hexathioglucose 🟡 (distorted.)
This post will be periodically edited with the current status.