ubergrape / pyspotlight

A thin wrapper around the DBPedia Spotlight REST API
BSD 2-Clause "Simplified" License
58 stars 25 forks source link

Added a types attribute to annotate() and candidates() #9

Open aolieman opened 10 years ago

aolieman commented 10 years ago

Added a types attribute to annotate() and candidates(), which enables server-side filtering of resources. It also makes for a nice addition to the policy parameter.

I've tested it on both kinds of backends, but it only works properly with the Lucene-backed web service. This is, however, not a bug in pyspotlight and seems to be an unnoticed bug in Spotlight's statistical backend. It will be discussed in DBpS issue #251.

aolieman commented 10 years ago

The problem of using a types filter with the statistical backend was an issue with missing documentation. Besides setting the types parameter, coreferenceResolution=false needs to be passed to the API in order for it to function. Because this behavior might change in the near future (see #251), I would not suggest to change the signature of annotate() solely to accommodate it.

But I would still like this to work asap ;-). My suggestion is to include all filter-related parameters in a filters attribute, which accepts a dictionary with any optional filters. I'm not sure if it's necessary, but I've included policy=whitelist as a default in the filter_kwargs dictionary that is included in the payload, to ensure that existing usage of pyspotlight is not disturbed.

Usage example:

only_person_filter = {
    'policy': "whitelist",
    'types': "DBpedia:Person",
    'coreferenceResolution': False
}

spotlight.annotate("http://localhost:2223/rest/annotate", 
                     "Komen Albert Verlinde en Metallica elkaar wel eens tegen in de showbizz?", 
                     filters=only_person_filter)
# [{u'similarityScore': 0.9999999700393123, u'surfaceForm': u'Albert Verlinde', u'support': 76, u'offset': 6, u'URI': u'http://nl.dbpedia.org/resource/Albert_Verlinde', u'percentageOfSecondRank': 0.0, u'types': u'DBpedia:Agent,Schema:Person,DBpedia:Http://xmlns.com/foaf/0.1/Person,DBpedia:Person,DBpedia:Presenter'}]
originell commented 10 years ago

That sounds great :D Are you still using it this way ? =)

aolieman commented 10 years ago

Yes, I am. By using a single filters argument, the signature of annotate and candidates only needs to change once. I think there are still plans to change the filter parameters in DBp Spotlight, but I'm not sure what the implementation status is. Would you like to incorporate my changes into pyspotlight?

aolieman commented 10 years ago

Hi @originell, Adding the filters attribute is still relevant. Would you mind merging this pull request and updating on PyPI? Or, if you are not interested in maintaining pyspotlight on PyPI, would you consider letting me submit new releases there for the time being? This is a nice wrapper and I would like to use it in many projects. In some cases, however, it is essential to use the version from PyPI.

Thanks!