mcs07 / PubChemPy

Python wrapper for the PubChem PUG REST API.
http://pubchempy.readthedocs.io
MIT License
379 stars 106 forks source link

Search by canonical SMILES to retrieve all stereoisomers #42

Open BalooRM opened 4 years ago

BalooRM commented 4 years ago

Is it possible to perform a PUG REST synchronous (fastidentity) search to retrieve all related isomers for a canonical SMILES string (unspecified sterochemistry)? get_cids() returns a list with a single CID.

For example, the following request returns the desired information as JSON. CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O is the canonical SMILES for albuterol (CID = 2083).

https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O/cids/JSON?identity_type=same_isotope

Returns:

{
  "IdentifierList": {
    "CID": [
      2083,
      123600,
      182176
    ]
  }
}
BalooRM commented 4 years ago

My fork (https://github.com/BalooRM/PubChemPy) has an update to pubchempy.py that permits searching by SMILES to retrieve specific isomers. In the example below, the canonical SMILES for albuterol, which has 2 stereoisomers and a non-specific structure in PubChem, are retrieved by using a fastidentity search with the identitytype = same_isotope. There are other isotopes for albuterol in PubChem.

The synchronous ("fast") searches are documented here: https://pubchemdocs.ncbi.nlm.nih.gov/pug-rest

The following output is generated by the test code which follows.

get_compounds by SMILES
CID      2083
IUPAC Name       4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O

get_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope'
CID      2083
IUPAC Name       4-[2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
CID      123600
IUPAC Name       4-[(1R)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NC[C@@H](C1=CC(=C(C=C1)O)CO)O
CID      182176
IUPAC Name       4-[(1S)-2-(tert-butylamino)-1-hydroxyethyl]-2-(hydroxymethyl)phenol
Canonical SMILES         CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O
Isomeric SMILES  CC(C)(C)NC[C@H](C1=CC(=C(C=C1)O)CO)O

get_cids by SMILES
[2083]

get_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope'
https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope
[2083, 123600, 182176]
import pubchempy as pcp

mycid = 2083 
mycansmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O"
myisosmiles = "CC(C)(C)NCC(C1=CC(=C(C=C1)O)CO)O"

print("get_compounds by SMILES")
for compound in pcp.get_compounds(mycansmiles, 'smiles'):
    print ('CID\t', compound.cid)
    print ('IUPAC Name\t', compound.iupac_name)
    print ('Canonical SMILES\t', compound.canonical_smiles)
    print ('Isomeric SMILES\t', compound.isomeric_smiles)

print("\nget_compounds by SMILES: searchtype='fastidentity', identity_type='same_isotope'")
for compound in pcp.get_compounds(mycansmiles, 'smiles', searchtype='fastidentity', identity_type='same_isotope'):
    print ('CID\t', compound.cid)
    print ('IUPAC Name\t', compound.iupac_name)
    print ('Canonical SMILES\t', compound.canonical_smiles)
    print ('Isomeric SMILES\t', compound.isomeric_smiles)

print("\nget_cids by SMILES")
print(pcp.get_cids(mycansmiles, 'smiles'))

print("\nget_cids by SMILES: searchtype=fastidentity, identity_type='same_isotope'")
print("https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/fastidentity/smiles/cids/JSON?identity_type=same_isotope")
print(pcp.get_cids(mycansmiles, 'smiles',searchtype='fastidentity', identity_type='same_isotope'))