mcs07 / PubChemPy

Python wrapper for the PubChem PUG REST API.
http://pubchempy.readthedocs.io
MIT License
379 stars 106 forks source link

Can not generate SMILES of compounds from known CAS numbers #82

Closed MathewGolding closed 1 month ago

MathewGolding commented 7 months ago

Hello, I recently began using PubChemPy, and thank you for creating this incredible platform! I am attempting to find the SMILES of over 500 compounds via the get_compound() function however the only data for my input into the function is a I have is a list of CAS numbers. As far as I can tell, PubChemPy has no way of pulling a compound from the CAS number using the get_compound() function as CAS numbers are not a valid input for this function. i.e. I have the CAS number "3586-12-7" yet there is not a valid input for this value into the get_compound() function

Any help resolving this is massively appreciated!

nbehrnd commented 7 months ago

In a virtual environment of Python, amended by cirpy (version 1.0.2 from pypi), I just run the following conversions with the CAS of the example here; your compound; glucose and fructose as reported by Wikipedia's property boxes. It could be a suitable tool to extend from here:

$ python
Python 3.11.7 (main, Dec  8 2023, 14:22:46) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cirpy
>>> 
>>> data = ['108-95-2', '3586-12-7', '50-99-7', '57-48-7']
>>> 
>>> for entry in data:
...     cirpy.resolve(entry, 'smiles')
... 
'Oc1ccccc1'
'Nc1cccc(Oc2ccccc2)c1'
'OCC1OC(O)C(O)C(O)C1O'
'OC[C@@H](O)[C@@H](O)[C@H](O)C(=O)CO'
>>> 
MathewGolding commented 7 months ago

Tried out CIRpy as recommended by nbehrnd above. Whilst it was, for the most part, successful I still lack the SMILES of roughly 70 compounds. Is there anything else I can use, preferably with PubChemPy to get these SMILES?

nbehrnd commented 7 months ago

Well, it depends a bit on the chemicals behind the CAS number.


khoivan88 commented 7 months ago

@MathewGolding , I made something that lookup pubchem cid based on different identifier (CAS, smiles, inchi, inchi_key). Maybe you can draw to your application: https://github.com/khoivan88/pka_lookup/blob/9705117d70e9162fe4c410d5fa884d550f02bad2/src/pka_lookup_pubchem.py#L44

README: https://github.com/khoivan88/pka_lookup/tree/master