mcs07 / PubChemPy

Python wrapper for the PubChem PUG REST API.
http://pubchempy.readthedocs.io
MIT License
388 stars 108 forks source link

Unable to get 3D structures for some molecules using Pubchempy #49

Open ffzffz08 opened 4 years ago

ffzffz08 commented 4 years ago

Hi Folks: I am trying to get 3D structures for compounds in a database. The issue is I was not able to get 3D structures for some of the compounds, but I am very condifent that 3D strucutres for these compounds do exist on Pubchem. For instance, the molecule Bunamidine hydrochloride, whose Pubchem CID is 13985, does have a 3D strucutre on Pubchem (you can search manually to verify). Use the command below: _pcp.get_compounds('13985','cid',recordtype = '3d') would return an empty list [], meaning that Pubchempy thinks there is no 3D strucutre for this molecule; However, when using _pcp.getcompounds('13985','cid') I got the result [Compound(13985)], meaning that at least Pubchem is able to identify this molecule. This is happening to many of the compounds in the database I am using. Interestingly, Pubchempy is able to correctly acquire 3D structure for some (very few) of those moleucules, for instance, CID 10522. Anyone can provide any insight on what is going on? Thanks in advance for any help!

khoivan88 commented 4 years ago

Hi, if you look a bit closer on pubchem, you will find out that your compound is a hydrochloride salt and the 3D structure is not of the salt but the free base (CID 139856) Screenshot here: Annotation 2020-08-26 193411

If you want to dig a little deeper, pubchempy.get_compounds('13985','cid',record_type = '3d') is just an just a call to this URL: (https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/13985/JSON?record_type=3d). If you browse that URL using your favorite browser, you would get a 'no record found').

I hope this helps!

ffzffz08 commented 4 years ago

Hi, if you look a bit closer on pubchem, you will find out that your compound is a hydrochloride salt and the 3D structure is not of the salt but the free base (CID 139856) Screenshot here: Annotation 2020-08-26 193411

If you want to dig a little deeper, pubchempy.get_compounds('13985','cid',record_type = '3d') is just an just a call to this URL: (https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/13985/JSON?record_type=3d). If you browse that URL using your favorite browser, you would get a 'no record found').

I hope this helps!

Hi: Thanks a lot! That answers my question - not only about this very molecule but about all others. I did have a vague impression that those 'should have 3D structures but could not be located by Pubchempy' molecules are salts.
I was also reading your answer for another post (you are really contributing a lot to the community - thanks! ). Pubchem mark the relationship between the salt and its free base as 'parent compound'. Since you were able to retrive pKa using your script, I could probably modify that code (with your permission of course) and retrive the parent compound as well.

khoivan88 commented 4 years ago

Hi @ffzffz08 ,

Thank you very much for your kind words. I am glad that some would find my answers helpful :) .

And yes, please modify my code any way you want, I believe I have at least MIT license on there so you can modify it anyway you desire. I have not use pubchempy to retrieve 'parent' compound before but I think you would figure it out one way or another :)

Best of luck and let us know if you have any questions :D!