linkedtv / wp2

0 stars 0 forks source link

THD: Duplicated entities in JSON response #19

Closed jluisred closed 10 years ago

jluisred commented 10 years ago

Executing the following call against THD v.2:

curl --insecure -X POST --data-binary @BERLIN-2013-04-04-21-45-53-04042145_Chapter.txt "https://entityclassifier.eu/thd/api/v2/extraction?lang=de&entity_type=all&provenance=thd,dbpedia&types_filter=dbo&knowledge_base=linkedHypernymsDataset&priority_entity_linking=true&apikey=**************************&format=json"

The returned JSON contains many duplicated entities, like for example the first item in the response:

[ { "startOffset":568, "endOffset":573, "underlyingString":"Polen", "entityType":"named entity", "types":[] }, { "startOffset":568, "endOffset":573, "underlyingString":"Polen", "entityType":"named entity", "types":[] }, ...

The file posted is available here: https://dl.dropboxusercontent.com/u/4909358/BERLIN-2013-04-04-21-45-53-04042145_Chapter.txt

m1ci commented 10 years ago

The bug is solved - no more duplicated entities in the response. Test request:

curl --insecure -X POST --data-binary @BERLIN-2013-04-04-21-45-53-04042145_Chapter.txt "https://entityclassifier.eu/thd/api/v2/extraction?lang=de&entity_type=all&provenance=thd,dbpedia&types_filter=dbo&knowledge_base=linkedHypernymsDataset&priority_entity_linking=true&apikey=xxxxxxxxxxxxxxxx&format=json"