lukasturcani / stk

A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases.
https://stk.readthedocs.io/
MIT License
251 stars 47 forks source link

Basic_ea.py doesn't run with default stk version #213

Closed misterblonde closed 4 years ago

misterblonde commented 4 years ago

when running

python basic_ea.py

The first initial generation of 25 .mol files/molecules is being generated, but then line 191 gives rise to this conversion error and the EA run doesn't occur/the script terminates with error:

raise InchiReadWriteError(inchi, message)
rdkit.Chem.inchi.InchiReadWriteError: ('InChI=1S/C21H29BrFNO/c1-12(18(2,3)11-23)20(5,25)8-6-7-15-16-13-9-19(13,4)14-10-21(16,24-15)17(14)22/h13,16,25H,1,6-11H2,2-5H3', 'Omitted undefined stereo')

The error can be resolved by changing treatWarningAsError to False in stk/molecular/key_makers/utilities.py

return rdkit.MoltoInchi(
mol = molecule.to_rdkit_mol(), 
treatWarningAsError = True,
)
lukasturcani commented 4 years ago

Thank you for the report!

@andrewtarzia Thoughts? I have a feeling you were involved in this change.

I think warnings should be errors, which means the best way to deal with this, would be to for the EA to define stereo I guess.

andrewtarzia commented 4 years ago

Is the molecule that causes it actually correct/chemically reasonable? Requiring stereochemistry in all cases is potentially problematic and not what we are going for.

I guess the simple solution is to use the same code in the Smiles code (i.e. AssignStereochemistry)?

lukasturcani commented 4 years ago

It has correct valence, the SMILES is [H]OC(C(=C([H])[H])C(C([H])([H])[H])(C([H])([H])[H])C([H])([H])F)(C([H])([H])[H])C([H])([H])C([H])([H])C([H])([H])C1=NC23C(Br)=C(C2([H])[H])C2(C([H])([H])[H])C([H])([H])C2([H])C13[H] but even if I go through the SMILES - I can't get the rdkit to assign the stereo chemistry. So we might have to keep it as a warning.

lukasturcani commented 4 years ago

For reference, this is the PR that introduced the change https://github.com/lukasturcani/stk/pull/113

andrewtarzia commented 4 years ago

Is there any way to find out which bond/atom is missing stereochem?

I am ok with it just being a warning. I don't remember what made me make that change.

lukasturcani commented 4 years ago

Not sure.

I will revert the change for now and see if it causes problems and we can tackle those problems once they become clear. It looks like the change was introduced casually, not a a fix to some specific problem.

I will note that broken Inchi sometimes produce an empty string. Maybe the user will just have to check for this for now? Dunno.

andrewtarzia commented 4 years ago

I vaguely remember it being a matter of ensuring that an empty string is not unknowingly produced.

Should a KeyMaker in general not allow an empty string as a key? Does that seem like a logical restraint?

lukasturcani commented 4 years ago

Hmmm. This seems pretty reasonable to me.

lukasturcani commented 4 years ago

It looks like this is the approach the get_inchi_key function uses already.

lukasturcani commented 4 years ago

@misterblonde Released a new version of stk on pip which should have this fixed. The master branch also has the fix.

misterblonde commented 4 years ago

it was Steven who was involved in this and told me to report it, so you'd know and it affected all molecules not just a few. Thanks for solving it this quickly! I couldn't slack you. THANK YOU