openforcefield / smarty

Chemical perception tree automated exploration tool.
http://openforcefield.org
MIT License
19 stars 8 forks source link

Errors when running SMIRKY on MiniDrugBank #223

Closed bannanc closed 7 years ago

bannanc commented 7 years ago

Initially I thought it was just missing 3D coordinates: I forgot about this problem when I recreated MiniDrugBank. SMIRKS string searching requires atom stereo and the 2D coordinates in DrugBank causes errors. I'm adding 3 dimensional coordinates to miniDrugBank right now. I'll put into a pull request later tonight.

@davidlmobley fyi ^ this caused issues with my most recent smirky simulations.

Later I found a problem with using a non-ideal way to get SMILES strings for a set of molecules, explained below.

bannanc commented 7 years ago

Well I have a more complex problem. I'm actually getting this error:

Traceback (most recent call last):
  File "/Users/bannanc/anaconda/bin/smirky", line 11, in <module>
    load_entry_point('smarty==0.1.5', 'console_scripts', 'smirky')()
  File "/Users/bannanc/anaconda/lib/python2.7/site-packages/smarty/cli_smirky.py", line 167, in main
    option.SMIRFF, option.temperature, output)
  File "/Users/bannanc/anaconda/lib/python2.7/site-packages/smarty/sampler_smirky.py", line 305, in __init__
    [self.type_matches, self.total_type_matches] = self.best_match_reference_types(typelist)
  File "/Users/bannanc/anaconda/lib/python2.7/site-packages/smarty/sampler_smirky.py", line 523, in best_match_reference_types
    reference_typename = self.reference_typed_molecules[smile][indices]
KeyError: '[H]/N=C(\\c1ccc(cc1)CNC(=O)C(CCC(=O)N)NC(=O)C(Cc2c[nH]c3c2cc(cc3)OCCC)NS(=O)(=O)CC)/N'

I didn't get this with the old MiniDrugBank set, which I think had the same molecules... I can't figure out why this would happen. I use the OECreateIsoSmiString to create SMILES strings to track indices by molecule. Is there any way that method would create a different SMILES string for the same molecule?

bannanc commented 7 years ago

Oh, no... I had OEMolToSmiles(mol) before, this was an oversight. I fixed the two relevant lines in sampler_smirky and am going to include it in pull request #224

davidlmobley commented 7 years ago

So, @bannanc - this is resolved now? You were creating canonical rather than isomeric SMILES before, so now you've fixed it?

Probably we should start all our workflows by checking whether molecules are 3D. I think there is an OEGetDimension or some such (something about number of dimensions) which is the key way to check whether molecules are 3D. We should probably just throw an exception when they are not. (The same would go for the stuff Nam and Daisy are doing...)

davidlmobley commented 7 years ago

(LMK if you need me to dig up the exact command.)

bannanc commented 7 years ago

@davidlmobley

That might be a good idea, it turns out smirky will run fine with flat molecules.

The error causing it to crash was the smiles either way. However, SMIRKS searching complains about a lack of stereochemistry, I think because there are SMIRKS patterns (not that we're using) that search for specific stereochemistry. All the complaints were making the slurmError files hard to read.

bannanc commented 7 years ago

@davidlmobley

this is resolved now?

Yes this is fixed now, the other SMILES strings were not isomeric.

davidlmobley commented 7 years ago

@bannanc - can you create issues relating to checking for 3D coordinates? It seems like it would be relevant here, but also in the undergrads' forcefieldcompare project.

bannanc commented 7 years ago

@davidlmobley I can create an issue here. The undergrad project fixed this by creating a new 3D structure when they generate the tripos mol2 files. Their scripts include an option of starting from a list of SMILES strings so that project is taken care of.

bannanc commented 7 years ago

Pull request #224 fixed the issue talked about here so I'm going to close this.