ubergrape / pyspotlight

A thin wrapper around the DBPedia Spotlight REST API
BSD 2-Clause "Simplified" License
58 stars 25 forks source link

Response can contain list as surface form value #4

Closed aolieman closed 11 years ago

aolieman commented 11 years ago

Likely something that should also be fixed in Spotlight itself, but it should be easier to fix here.

File "C:\Python27\lib\site-packages\pyspotlight-0.6.5-py2.7.egg\spotlight\__init__.py", line 168, in annotate
    return [_dict_cleanup(resource) for resource in pydict['Resources']]
  File "C:\Python27\lib\site-packages\pyspotlight-0.6.5-py2.7.egg\spotlight\__init__.py", line 80, in _dict_cleanup
    clean[key] = _convert_number(value)
  File "C:\Python27\lib\site-packages\pyspotlight-0.6.5-py2.7.egg\spotlight\__init__.py", line 44, in _convert_number
    return int(value)
TypeError: int() argument must be a string or a number, not 'list'

It was a pretty long chunk of text, so I had to split it up to do manually with curl. Here is the request that causes the error:

$ curl -H "accept: application/json" http://localhost:2222/rest/annotate --data
 "text=%0D%0AResearch+in+this+cycle+has+happened+in+different+ways.+Part+of+it+
was+done+through+analysis%2C+%0D%0Aothers+through+prototyping+and+observation.+
%0D%0A+%0D%0AAnalysis+of+flashmobs+%0D%0ATo+find+out+what+the+exact+interaction
+was+that+I+was+looking+for%2C+I+did+an+analysis+of+%0D%0Aflashmobs.+The+channe
l+of+%E2%80%98Improv+Everywhere%E2%80%99+I+used+as+inspiration.+%0D%0AI+looked+
at+many+of+these+examples+and+found+keywords+that+can+be+found+in+the+workbook+
%0D%0A%5B1%5D.+%0D%0A+%0D%0A+%0D%0AObservation+%0D%0ATo+see+how+people+would+re
spond+to+a+smile+I+went+in+the+train+from+Duivendrecht+to+Delft.++%0D%0ASome+of
+the+observations+can+be+found+in+the+workbook.+%5B3%5D+%0D%0A+%0D%0AFindings+%
0D%0APositivity%2C+such+as+a+smile%2C+has+a+good+influence+on+the+atmosphere+of
+the+train-%0D%0Acompartment.+A+happy+ticket-controller+creates+some+smiles+alr
eady.+Actually+just+like+in+%0D%0A&confidence=0.9&support=0"

The response (just the relevant part):

{
  "@text": "\r\nResearch in this cycle has happened in different ways. Part of i
t was done through analysis, \r\nothers through prototyping and observation. \r\
n \r\nAnalysis of flashmobs \r\nTo find out what the exact interaction was that
I was looking for, I did an analysis of \r\nflashmobs. The channel of ÔÇÿImprov
Everywhere' I used as inspiration. \r\nI looked at many of these examples and fo
und keywords that can be found in the workbook \r\n[1]. \r\n \r\n \r\nObservatio
n \r\nTo see how people would respond to a smile I went in the train from Duiven
drecht to Delft.  \r\nSome of the observations can be found in the workbook. [3]
 \r\n \r\nFindings \r\nPositivity, such as a smile, has a good influence on the
atmosphere of the train-\r\ncompartment. A happy ticket-controller creates some
smiles already. Actually just like in \r\n",
  "@confidence": "0.9",
  "@support": "0",
  "@types": "",
  "@sparql": "",
  "@policy": "whitelist",
  "Resources":   [
        ....
    {
      "@URI": "http://dbpedia.org/resource/Contub",
      "@support": "4",
      "@types": "",
      "@surfaceForm": [1],
      "@offset": "421",
      "@similarityScore": "0.8010543367647199",
      "@percentageOfSecondRank": "0.0"
    },

The issue is that "[1]" is present in the text, but is not encoded as a string by Spotlight. Could this be fixed by turning it into a string anyway? I'm guessing surfaceForm should always have a string value..

originell commented 11 years ago

I would feel better if you could first make a bug report in dbpedia-spotlight/dbpedia-spotlight :) Let's see what they have to say about that. I guess any JSON parser will convert "[1]" into the language's Array type.

I will try and introduce a workaround for this.. though that's really patch-y haha

aolieman commented 11 years ago

Okay, thanks for the workaround! Of course you are completely right about this being a dbpedia-spotlight issue. I reported it as issue #197 there.

originell commented 11 years ago

Great :) Let's see what they have to say about that :)