mcs07 / PubChemPy

Python wrapper for the PubChem PUG REST API.
http://pubchempy.readthedocs.io
MIT License
379 stars 106 forks source link

JSON Decode Error when using similarity search #55

Open Cajac102 opened 3 years ago

Cajac102 commented 3 years ago

Hey,

I am trying to search pubchem for similar compounds with this call:

similars = pcp.get_compounds(smile, 'smiles', searchtype='similarity', threshold=0.7, as_dataframe=True)

This works well for some SMILES, for example for "Cc1noc(C)c1Br". But for others, e.g. "Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O", I get the following error:

Traceback (most recent call last):
  File "/home/caro/leval/.snakemake/scripts/tmpqq4csqb1.find_pubchem_hits.py", line 37, in <module>
    similars = pcp.get_compounds("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O", 'smiles', searchtype='similarity', threshold=similarity_threshold, as_dataframe=True)
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 321, in get_compounds
    results = get_json(identifier, namespace, searchtype=searchtype, **kwargs)
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 299, in get_json
    return json.loads(get(identifier, namespace, domain, operation, 'JSON', searchtype, **kwargs).decode())
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/site-packages/pubchempy.py", line 288, in get
    status = json.loads(response.decode())
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/caro/leval/.snakemake/conda/db9d54b7c1d0500c41e4539e39469ab2/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 322816 column 7 (char 7196244)

If I turn the double quotation marks around the SMILES into single ones, I get

json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 215438 column 3 (char 4812373)

I would be glad if you could help me here!

Cheers, Caro

nbehrnd commented 3 years ago

May you share a MWE yielding this problem? With a minimal

import pubchempy as pcp

def retrieve_similar(structure=""):
    """Retrieve PubChem entries of similar structure."""
    similars = pcp.get_compounds(structure,
                                 'smiles',
                                 searchtype='similarity',
                                 threshold=0.7,
                                 as_dataframe=True)
    print(similars)

# the example working fine
retrieve_similar("Cc1noc(C)c1Br")

(or, retrieve_similar("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O"), respectively), I interpret the output for both like a successful interaction with the database (Python 3.9.2, PubChemPy 1.0.4). For documentation, the archive below includes a Jupyter notebook with a one-time code.

test_case_similarity.zip

Cajac102 commented 3 years ago

Thanks! With your example,

retrieve_similar("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O")

works perfectly for me too. However,

ligand_smile = "Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O"
retrieve_similar(ligand_smile)

throws a JSON decode error again.

I found two fixes.

  1. Explicitely casting it into a string before makes it work again: retrieve_similar(str(ligand_smile))

This confuses me because type(ligand_smile) and type("Cn1c(=O)c2nc(Cl)[nH]c2n(C)c1=O") both give me <class 'str'>.

  1. I used python 3.7.10 and PubChemPy 1.0.4. Upgrading to python 3.9 also fixed the problem.