Open bryanwweber opened 8 years ago
Maybe the RDKit
package (http://www.rdkit.org/docs/GettingStartedInPython.html) can help for InChI and Smiles validation. The package can be installed via anaconda:
conda install -c rdkit rdkit
Something like this should work:
from rdkit import Chem
methane_smiles = Chem.MolFromSmiles('C')
methane_converted_inchi = Chem.MolToInchi(methane_smiles)
methane_converted_inchi == 'InChI=1S/CH4/h1H4'
That's an interesting idea... then we wouldn't need to call out to the internet to do the validation. RDKit is a pretty heavy dependency though, so we should think about which road to go. Thanks!
Another option, if we decide to go with the online lookup, is ChemSpiPy, a Python interface to the ChemSpider API: http://chemspipy.readthedocs.io/en/latest/
From @rwest
For ChemKED folks: extracting chemical formula from InChI is trivial - it is the only sublayer that must exist in every InChI, and is always the first eg.
InChI=1/C2H6O/.....
. For converting from SMILES, I withdraw my offer to write a simple pure python implementation that does this (it looks like a pain, beyond the simplest things), and strongly suggest using existing frameworks. You could quite easily write something that tries a few in sequence to see what the user has installed, and I could help with this if you need - I’m familiar with OpenBabel, RDKit, and cirpy.
https://kincodecon.slack.com/archives/C34N5PML2/p1513028041000049
ECNet seems to have figured out how to get molecules from SMILES: https://github.com/tjkessler/ECNet/pull/21
See comments on #8