mediacloud / cliff-api-client

A Python client for the CLIFF geoparsing tool
MIT License
5 stars 5 forks source link

Error in parse_text() - ValueError("No JSON object could be decoded") #5

Closed Joemillard closed 4 years ago

Joemillard commented 4 years ago

I've just updated to the latest version of cliff.api (note my previous update would have been in 2018), and now can't seem to get parse_text functioning. I can access JSON through the localhost in the browser, so I know the VM must be running correctly, but when I run the following:

from cliff.api import Cliff
my_cliff = Cliff('http://localhost:8999')
output = my_cliff.parse_text("This is about Einstien at the IIT in New Delhi.")

I come up against the below error:

Traceback (most recent call last):
  File "C:/Users/joeym/Documents/carnivore_text_mining/python/geoparse_text.py", line 3, in <module>
    output = my_cliff.parse_text("This is about Einstien at the IIT in New Delhi.")
  File "C:\Python27\lib\site-packages\cliff\api.py", line 31, in parse_text
    return self._parse_query(self.PARSE_TEXT_PATH, cleaned_text, demonyms, language)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 58, in _parse_query
    return self._query(path, payload)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 65, in _query
    return r.json()
  File "C:\Python27\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Python27\Lib\json\__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\Lib\json\decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\Lib\json\decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
>>> 

Whereas in the browser, the same text call to the localhost spits out the JSON:

{"results":{"organizations":[{"count":1,"name":"IIT"}],"places":{"focus":{"cities":[{"id":1261481,"lon":77.22445,"name":"New Delhi","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PPLC","featureClass":"P","stateCode":"07","lat":28.63576,"stateGeoNameId":"1273293","population":317797}],"states":[{"id":1273293,"lon":77.1,"name":"National Capital Territory of Delhi","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"ADM1","featureClass":"A","stateCode":"07","lat":28.6667,"stateGeoNameId":"1273293","population":16787941}],"countries":[{"id":1269750,"lon":79.0,"name":"Republic of India","score":1,"countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PCLI","featureClass":"A","stateCode":"00","lat":22.0,"stateGeoNameId":"","population":1173108018}]},"mentions":[{"id":1261481,"lon":77.22445,"source":{"charIndex":38,"string":"New Delhi"},"name":"New Delhi","countryGeoNameId":"1269750","countryCode":"IN","featureCode":"PPLC","featureClass":"P","stateCode":"07","confidence":1.0,"lat":28.63576,"stateGeoNameId":"1273293","population":317797}]},"people":[{"count":1,"name":"Einstien"}]},"status":"ok","milliseconds":221,"version":"2.3.0"}

The module worked perfectly before, so I assume it must either be an update to the module itself, or some new dependency incompatibility. Any ideas what the problem might be or how I might diagnose myself? I should add, thank you so much for this python module - it's been incredibly useful!

rahulbot commented 4 years ago

Hmm... I can't reproduce this one. I don't have a 2.x Python environment setup anymore, but that shouldn't be a problem for requests to make the call and parse json response. Did you double check the port number already?

Another idea - can you try again, but turn on the debug level logging for that logger? If you set it to DEBUG level, there is a line in there that tells you what the content of the response from CLIFF was. Something like this should work:

my_cliff._log.setLevel(logging.DEBUG)

Also did you see anything weird in the CLIFF tomcat log? There might be a clue there.

Joemillard commented 4 years ago

Thanks for your reply and help, I've figured out what the problem is now.

I added the DEBUG level as you suggested using the below:

logging.basicConfig(level=logging.DEBUG)

And that returns the following:

DEBUG:cliff.api:Querying '/cliff-2.6.1/parse/text' (demonyms=False)
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:8999
DEBUG:urllib3.connectionpool:http://localhost:8999 "POST /cliff-2.6.1/parse/text HTTP/1.1" 404 995
DEBUG:cliff.api:CLIFF says '<html><head><title>Apache Tomcat/7.0.59 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - /cliff-2.6.1/parse/text</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>/cliff-2.6.1/parse/text</u></p><p><b>description</b> <u>The requested resource is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.59</h3></body></html>'

Traceback (most recent call last):
  File "C:\Users\joeym\Documents\carnivore_text_mining\python\geoparse_text.py", line 8, in <module>
    output = my_cliff.parse_text("This is about Einstien at the IIT in New Delhi.")
  File "C:\Python27\lib\site-packages\cliff\api.py", line 31, in parse_text
    return self._parse_query(self.PARSE_TEXT_PATH, cleaned_text, demonyms, language)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 58, in _parse_query
    return self._query(path, payload)
  File "C:\Python27\lib\site-packages\cliff\api.py", line 65, in _query
    return r.json()
  File "C:\Python27\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\Python27\Lib\json\__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "C:\Python27\Lib\json\decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Python27\Lib\json\decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

I then realised that the call to localhost I'm making through the browser is still via cliff-2.3.0, whereas in api.py it's going via cliff-2.6.1. I've just changed the version in api.py back to cliff-2.3.0 and it now seems to work fine again.

Thanks again for you help.