ixa-ehu / ixa-pipe-nerc

IXA pipes Named Entity Tagger (http://ixa2.si.ehu.es/ixa-pipes).
Apache License 2.0
31 stars 23 forks source link

nullpointerexception if terms/deps/tree are missing for some sentences #11

Closed vanatteveldt closed 8 years ago

vanatteveldt commented 8 years ago

Running the ixa-pipe-nerc on Dutch data I got a NullPointerException, probably caused by not all words having terms or deps:

 $ java -jar $MDIR/ixa-pipe-nerc/target/ixa-pipe-nerc-1.6.0-exec.jar tag -m $TDIR/nerc-models-1.5.4/nl/nl-6-class-clusters-sonar.bin < /tmp/61901.naf 
CLI options: Namespace(lexer=off, model=/data/wva/newsreader_pipe_nl/tools/nerc-models-1.5.4/nl/nl-6-class-clusters-sonar.bin, dictPath=off, outputFormat=naf, dictTag=off, language=null, clearFeatures=no)
ixa-pipe-nerc model loaded in: 4712 miliseconds ... [DONE]
Exception in thread "main" java.lang.NullPointerException
    at java.util.ArrayList.addAll(ArrayList.java:559)
    at ixa.kaflib.KAFDocument.getTermsFromWFs(KAFDocument.java:1508)
    at eus.ixa.ixa.pipe.nerc.Annotate.annotateNEs(Annotate.java:239)
    at eus.ixa.ixa.pipe.nerc.CLI.annotate(CLI.java:239)
    at eus.ixa.ixa.pipe.nerc.CLI.parseCLI(CLI.java:173)
    at eus.ixa.ixa.pipe.nerc.CLI.main(CLI.java:156)

Input file is available at https://gist.github.com/vanatteveldt/cd540b98171feb5af0bfdf5ee83682cf. The only strange thing I can see is that the first sentence wasn't parsed, possibly due to the fact that it contains a "|" character. I'm crossposting this to the Dutch morphosyntactic module, but I guess that this module could also process what's left rather than throwing a nullpointerexception?

ragerri commented 8 years ago

Hi Wouter,

ixa-pipe-nerc reads the WF layer for processing and asks for references to terms ids when creating the entities layer. As there are WFs that are not in the terms layer the kaflib library cannot find an id for some entity in the terms. Hence the null pointer thrown.

I have put a hack to fix that (e.g., to make sure that there that the WFs have a term linked to them) and it is available in master. Let me know if you need it in maven central.

R

On Sun, Jul 17, 2016 at 1:32 PM, Wouter van Atteveldt < notifications@github.com> wrote:

Running the ixa-pipe-nerc on Dutch data I got a NullPointerException:

$ java -jar $MDIR/ixa-pipe-nerc/target/ixa-pipe-nerc-1.6.0-exec.jar tag -m $TDIR/nerc-models-1.5.4/nl/nl-6-class-clusters-sonar.bin < /tmp/61901.naf CLI options: Namespace(lexer=off, model=/data/wva/newsreader_pipe_nl/tools/nerc-models-1.5.4/nl/nl-6-class-clusters-sonar.bin, dictPath=off, outputFormat=naf, dictTag=off, language=null, clearFeatures=no) ixa-pipe-nerc model loaded in: 4712 miliseconds ... [DONE] Exception in thread "main" java.lang.NullPointerException at java.util.ArrayList.addAll(ArrayList.java:559) at ixa.kaflib.KAFDocument.getTermsFromWFs(KAFDocument.java:1508) at eus.ixa.ixa.pipe.nerc.Annotate.annotateNEs(Annotate.java:239) at eus.ixa.ixa.pipe.nerc.CLI.annotate(CLI.java:239) at eus.ixa.ixa.pipe.nerc.CLI.parseCLI(CLI.java:173) at eus.ixa.ixa.pipe.nerc.CLI.main(CLI.java:156)

Input file is available at https://gist.github.com/vanatteveldt/cd540b98171feb5af0bfdf5ee83682cf. The only strange thing I can see is that the first sentence wasn't parsed, possibly due to the fact that it contains a "|" character. I'm crossposting this to the Dutch morphosyntactic module, but I guess that this module could also process what's left rather than throwing a nullpointerexception?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ixa-ehu/ixa-pipe-nerc/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZl_1iMR9ksDSIwDiWzlVIGCumRxsNeks5qWhLCgaJpZM4JONPz .

vanatteveldt commented 8 years ago

Thanks :). I never worked with Maven (my Java days ended more than 10 years ago...), I'll see if it works if I recompile from git. I think Ruben fixed the parsing to not ignore sentences with a "|", but it's probably good to prevent the exception in any case.

Thanks for the fix!