pr-omethe-us / PyKED

Python interface to the ChemKED database format
https://pr-omethe-us.github.io/PyKED/
BSD 3-Clause "New" or "Revised" License
15 stars 15 forks source link

Validate InChI et al. with an online database #23

Open bryanwweber opened 8 years ago

bryanwweber commented 8 years ago

See comments on #8

shsymoen commented 7 years ago

Maybe the RDKit package (http://www.rdkit.org/docs/GettingStartedInPython.html) can help for InChI and Smiles validation. The package can be installed via anaconda:

conda install -c rdkit rdkit

Something like this should work:

from rdkit import Chem
methane_smiles = Chem.MolFromSmiles('C')
methane_converted_inchi = Chem.MolToInchi(methane_smiles)
methane_converted_inchi == 'InChI=1S/CH4/h1H4'
bryanwweber commented 7 years ago

That's an interesting idea... then we wouldn't need to call out to the internet to do the validation. RDKit is a pretty heavy dependency though, so we should think about which road to go. Thanks!

bryanwweber commented 7 years ago

Another option, if we decide to go with the online lookup, is ChemSpiPy, a Python interface to the ChemSpider API: http://chemspipy.readthedocs.io/en/latest/

bryanwweber commented 6 years ago

From @rwest

For ChemKED folks: extracting chemical formula from InChI is trivial - it is the only sublayer that must exist in every InChI, and is always the first eg. InChI=1/C2H6O/...... For converting from SMILES, I withdraw my offer to write a simple pure python implementation that does this (it looks like a pain, beyond the simplest things), and strongly suggest using existing frameworks. You could quite easily write something that tries a few in sequence to see what the user has installed, and I could help with this if you need - I’m familiar with OpenBabel, RDKit, and cirpy.

https://kincodecon.slack.com/archives/C34N5PML2/p1513028041000049

bryanwweber commented 5 years ago

ECNet seems to have figured out how to get molecules from SMILES: https://github.com/tjkessler/ECNet/pull/21