Open aolieman opened 10 years ago
The problem of using a types filter with the statistical backend was an issue with missing documentation. Besides setting the types parameter, coreferenceResolution=false
needs to be passed to the API in order for it to function. Because this behavior might change in the near future (see #251), I would not suggest to change the signature of annotate()
solely to accommodate it.
But I would still like this to work asap ;-). My suggestion is to include all filter-related parameters in a filters attribute, which accepts a dictionary with any optional filters. I'm not sure if it's necessary, but I've included policy=whitelist
as a default in the filter_kwargs dictionary that is included in the payload, to ensure that existing usage of pyspotlight is not disturbed.
Usage example:
only_person_filter = {
'policy': "whitelist",
'types': "DBpedia:Person",
'coreferenceResolution': False
}
spotlight.annotate("http://localhost:2223/rest/annotate",
"Komen Albert Verlinde en Metallica elkaar wel eens tegen in de showbizz?",
filters=only_person_filter)
# [{u'similarityScore': 0.9999999700393123, u'surfaceForm': u'Albert Verlinde', u'support': 76, u'offset': 6, u'URI': u'http://nl.dbpedia.org/resource/Albert_Verlinde', u'percentageOfSecondRank': 0.0, u'types': u'DBpedia:Agent,Schema:Person,DBpedia:Http://xmlns.com/foaf/0.1/Person,DBpedia:Person,DBpedia:Presenter'}]
That sounds great :D Are you still using it this way ? =)
Yes, I am. By using a single filters
argument, the signature of annotate
and candidates
only needs to change once. I think there are still plans to change the filter parameters in DBp Spotlight, but I'm not sure what the implementation status is.
Would you like to incorporate my changes into pyspotlight?
Hi @originell,
Adding the filters
attribute is still relevant. Would you mind merging this pull request and updating on PyPI?
Or, if you are not interested in maintaining pyspotlight on PyPI, would you consider letting me submit new releases there for the time being? This is a nice wrapper and I would like to use it in many projects. In some cases, however, it is essential to use the version from PyPI.
Thanks!
Added a types attribute to annotate() and candidates(), which enables server-side filtering of resources. It also makes for a nice addition to the policy parameter.
I've tested it on both kinds of backends, but it only works properly with the Lucene-backed web service. This is, however, not a bug in pyspotlight and seems to be an unnoticed bug in Spotlight's statistical backend. It will be discussed in DBpS issue #251.