ixa-ehu / ixa-pipe-ned

This repository contains the Named Entity Disambiguation tool based on DBpedia Spotlight. Providing that a DBpedia Spotlight Rest server is running, the EHU-ned module will take KAF as input (containing <entities> elements) and perform Named Entity Disambiguation for your language of choice. Developed by IXA NLP Group (ixa.si.ehu.es).
1 stars 4 forks source link

invalid xml character #1

Open vanatteveldt opened 8 years ago

vanatteveldt commented 8 years ago

ixa-pipe-ned raises an exception if the input file contains an 'emoticon' xml character. Strange enough, it reports a different character (&#56867) than the one in the input (&#x1f623)

$ java -jar $MDIR/ixa-pipe-ned/target/ixa-pipe-ned-1.1.6.jar -p 2060 < /tmp/test.naf > /tmp/test2.naf
 INFO 2016-07-17 15:43:31,131 main [DBpediaSpotlightClient] - Querying API.
[Fatal Error] :2:287: Character reference "&#56867" is an invalid XML character.
Disambiguation failed: 
java.lang.NullPointerException
    at ixa.pipe.ned.Annotate.disambiguate2KAF(Annotate.java:295)
    at ixa.pipe.ned.Annotate.XMLSpot2KAF(Annotate.java:282)
    at ixa.pipe.ned.Annotate.disambiguateNEsToKAF(Annotate.java:70)
    at ixa.pipe.ned.CLI.parseCLI(CLI.java:97)
    at ixa.pipe.ned.CLI.main(CLI.java:28)

Input file: https://gist.github.com/vanatteveldt/11a99358916711a9afa62132d7db5e85. Manually replacing the smileys by "X" solves the issue.