primaryodors / primarydock

PrimaryOdors.org molecular docker.
Other
6 stars 4 forks source link

Multiple SMILES issues, including aromatic rings not being made coplanar. #5

Open electronicsbyjulie opened 2 years ago

electronicsbyjulie commented 2 years ago

Results of the test/mol_assem_tests.sh test:

benzene 🟢 toluene (ring last) 🟢 toluene (ring first) 🟢 cyclohexane 🟢 cyclopentane 🟢 cyclobutane 🟢 cyclobutene 🟡 (hydrogenation.) cyclobutadiene 🟡 (hydrogenation.) cyclopropane 🟢 cyclopropene ❌ (hydrogenation.) calicene ❌ (first ring fails to close.) 2,6-xylenol 🟢 proline 🟡 (hydrogenation) total substitution test 🟢 o-methylfuranium 🟡 (distorted.) naphthalene ❌ diphenyl ether ❌ (Ar-O angles.) azobenzene ❌ (double bond should be planar.) leaf alcohol/cis-3-hexen-1-ol 🟢 imidazole 🟡 (N-H bond should be coplanar.) glucose ❌ (the last carbon is in the wrong chirality, even if a [@] symbol is added.) tetrathioglucose 🟡 (distorted.) hexathioglucose 🟡 (distorted.)

This post will be periodically edited with the current status.

electronicsbyjulie commented 2 years ago

This works now:

mol_assem_test 'CC(C)c1c(CC)c(CC)c(C)c(CC(O)CCC)c1C(C)C'

electronicsbyjulie commented 2 years ago
c1ccccc1 ✅
Cc1c(O)c(C)c(O)c(C(=O)O)c1O ✅
c1cc[o+](C)c1 ❌
c1c2ccccc2ccc1 ❌
Cc1c2ccc(O)cc2ccc1 ❌
electronicsbyjulie commented 2 years ago

test/mol_assem_test 'c1ccccc1N=Nc2ccccc2'

electronicsbyjulie commented 2 years ago

https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system#Aromaticity

Aromatic nitrogen bonded to hydrogen, as found in pyrrole must be represented as [nH]; thus imidazole is written in SMILES notation as n1c[nH]cc1.

Current code is incorrect; it adds the hydrogen if not specified, so e.g. n1cncc1 is the same as n1c[nH]cc1 when in fact n1cncc1 should be the same as n1c[n]cc1.

electronicsbyjulie commented 2 years ago
c1ccccc1 ❌
Cc1c(O)c(C)c(O)c(C(=O)O)c1O ❌
c1cc[o+](C)c1 ❌
c1c2ccccc2ccc1 ❌
Cc1c2ccc(O)cc2ccc1 ❌
test/mol_assem_test 'c1ccccc1N=Nc2ccccc2' ❌
n1cncc1 ❌
electronicsbyjulie commented 2 years ago

EVERY TEST is curently failing.

The most fundamental molecular geometry and can't even get it close to right.

objarni commented 2 years ago

Even that more primitive tests, e.g points and atom?

electronicsbyjulie commented 2 years ago

Oh, sorry... every test in the original post was failing.

I managed to fix aromatic rings (i.e. alternating single and double bonds). Substituted monocyclic benzenes are working now, which is huge, since they're a large class of molecules, many of which are important to olfaction.

There is a function Molecule::correct_structure() designed to look for any flaws in the structure - incorrect bond lengths, incorrect angles, etc - and fix them. It works great, except for rings. Any time it is given a path that loops back on itself, the function goes terribly wrong.

I've resorted to the kludge of calling Molecule::make_coplanar_ring() for aromatic rings. Doesn't fix non-aromatic rings, e.g. proline (P in the amino and protein tests). Molecule::close_loop() should be able to handle those but it isn't working.

electronicsbyjulie commented 2 years ago

I think I will make an automated test for the list of SMILES inputs. The correct_structure() function should be able to evaluate the structure even if it doesn't succeed at repairing it.

electronicsbyjulie commented 2 years ago

Will resume in ~9 hours.

electronicsbyjulie commented 2 years ago

I am so tired of seeing some of the structures continue to look wrong time and again every attempt. There should be one algorithm that finds the best configuration for any structural formula.

Someone already found it. Why reinvent the wheel when 99% of use cases would have sudo access.

electronicsbyjulie commented 2 years ago

Alright, I don't have to feel so bad...

test/mol_assem_test 'C1(C=CC=C1)=C2C=C2' test/calicene.sdf

Even obabel gets this one wrong, though much less wrong than POdock code.

obabel_stumped

objarni commented 2 years ago

Nice find!

electronicsbyjulie commented 2 years ago

Thank you!

electronicsbyjulie commented 1 year ago

Cyclohexane has broken, as have several of the others that were working before (total substitution, etc).

Began fixing saturated rings for #263 but have to shelve it for the moment. Leaving a comment // issue_5 in molecule.cpp to know where to resume progress later.