Open BalooRM opened 4 years ago
My fork (https://github.com/BalooRM/PubChemPy) has an update to pubchempy.py that permits searching by SMILES to retrieve specific isomers. In the example below, the canonical SMILES for albuterol, which has 2 stereoisomers and a non-specific structure in PubChem, are retrieved by using a fastidentity search with the identitytype = same_isotope. There are other isotopes for albuterol in PubChem.
The synchronous ("fast") searches are documented here: https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest
The following output is generated by the test code which follows.
get_compounds by SMILES CID 2083 IUPAC Name 4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol Canonical SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O Isomeric SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O get_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope' CID 2083 IUPAC Name 4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol Canonical SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O Isomeric SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O CID 123600 IUPAC Name 4-[(1R)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol Canonical SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O Isomeric SMILES CC(C)(C)NC[C@@H](C1=CC(=C(C=C1)O)CO)O CID 182176 IUPAC Name 4-[(1S)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol Canonical SMILES CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O Isomeric SMILES CC(C)(C)NC[C@H](C1=CC(=C(C=C1)O)CO)O get_cids by SMILES [2083] get_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope' https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope [2083, 123600, 182176]
import pubchempy as pcp mycid = 2083 mycansmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O" myisosmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O" print("get_compounds by SMILES") for compound in pcp.get_compounds(mycansmiles, 'smiles'): print ('CID\t', compound.cid) print ('IUPAC Name\t', compound.iupac_name) print ('Canonical SMILES\t', compound.canonical_smiles) print ('Isomeric SMILES\t', compound.isomeric_smiles) print("\nget_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope'") for compound in pcp.get_compounds(mycansmiles, 'smiles', searchtype='fastidentity', identity_type='same_isotope'): print ('CID\t', compound.cid) print ('IUPAC Name\t', compound.iupac_name) print ('Canonical SMILES\t', compound.canonical_smiles) print ('Isomeric SMILES\t', compound.isomeric_smiles) print("\nget_cids by SMILES") print(pcp.get_cids(mycansmiles, 'smiles')) print("\nget_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope'") print("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope") print(pcp.get_cids(mycansmiles, 'smiles',searchtype='fastidentity', identity_type='same_isotope'))
Is it possible to perform a PUG REST synchronous (fastidentity) search to retrieve all related isomers for a canonical SMILES string (unspecified sterochemistry)? get_cids() returns a list with a single CID.
For example, the following request returns the desired information as JSON. CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O is the canonical SMILES for albuterol (CID = 2083).
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O/cids/JSON?identity_type=same_isotope
Returns: