reckart / tt4j

TreeTagger for Java
http://reckart.github.io/tt4j/
Apache License 2.0
16 stars 7 forks source link

Null Model #7

Closed reckart closed 9 years ago

reckart commented 9 years ago

Original issue 7 created by reckart on 2011-10-26T08:16:28.000Z:

Helle Richard,

Sorry to bother you once again.

I now use tt4j 1.0.16. It works fine with my "less-than-10-document" test cases. However, it systematically raises the exception below after a certain amount of documents. This amount is variable - around 30 document processed - but it could depend on the document sizes.

The exception points out that the tagger process is null as well as the model (then it fails at _model.install()) when the tagger process is called.

Do you known why this exception has been thrown. Should I set a parameter I didn't set before to the wrapper?

Thanks in advance, Jérôme

Caused by: java.lang.NullPointerException at org.annolab.tt4j.TreeTaggerWrapper.getTaggerProcess(TreeTaggerWrapper.java:671) at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:518)

reckart commented 9 years ago

Comment #1 originally posted by reckart on 2011-10-26T18:31:13.000Z:

Hi Jérôme,

no worries. If it's a bug in my TT4J, I want to know about it. Since I expect TT4J to be used in educational contexts, it should also produce proper error messages when something fails.

It looks like the whole thing starts with a NullPointerException in your handler code at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:223)

{{{ org.apache.uima.analysis_engine.AnalysisEngineProcessException at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:195) at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897) at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577) Caused by: org.annolab.tt4j.TreeTaggerException: java.lang.NullPointerException at org.annolab.tt4j.TreeTaggerWrapper.checkThreads(TreeTaggerWrapper.java:590) at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:552) at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:193) ... 14 more Caused by: java.lang.NullPointerException at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:223) at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:1) at org.annolab.tt4j.TreeTaggerWrapper$Reader.run(TreeTaggerWrapper.java:933) at java.lang.Thread.run(Thread.java:679) }}}

I have no idea what's at that line 223 since the committed version in your subversion repository does not seem to be what you were using to produce these exceptions (line numbers make no sense). I suspect that you get you might not check if the lemma or postag arguments in the Handler.token() can actually be null. That can happen e.g. if for some reason an XML tag appears in a document. Having a TT4J trace mode log should yield additional insight.

When there is an exception in the handler, that causes TT4J to shut down, set the model to null and kill the background process. I should probably throw an additional exception if somebody tries to invoke process() after such a forced shutdown. I want to make sure that when a problem occurs, the developer/user is made aware and that the processing stops hard, because TT4J does not log (in order to keep dependencies minimal) and because in mass processing, such log messages easily could get overlooked.

In the DKPro wrapper, we cannot have the problem that the model is set to null, because we call setModel() every time we start processing a CAS. TT4J takes care to switch the model only if the model name changes, so it's safe to call that as often as you wish. So even if TT4J fails on one CAS, on the next CAS it is reinitialized properly - however, we usually fail hard if there is an exception in any annotator.

reckart commented 9 years ago

Comment #2 originally posted by reckart on 2011-11-06T11:26:18.000Z:

Hi Jérôme,

have you been able to resolve this issue?

reckart commented 9 years ago

Comment #3 originally posted by reckart on 2011-11-09T08:48:24.000Z:

Hi Richard,

I solve the problem which was mine! You were right: I had to set the wrapper model as the process has been reset when exceptions were thrown by my token handler.Thank you very much for your help.

Actually, I didn't need to set the model with the version 1.0.12 I used before I faced the Chinese flush sequence issue. Exceptions were handled without resetting the process to null, am I right? But it wasn't the case for the next versions. So I should have been more careful while reading the change log. I miss this point.

I put a powered-by link to tt4j in the code project I maintain (see http://code.google.com/p/ttc-project/). I really enjoy using tt4j.

reckart commented 9 years ago

Comment #4 originally posted by reckart on 2011-11-09T08:50:08.000Z:

Exceptions are resetting the process since a very long time. For the chinese support I really only changed a single String in the code.

Thanks for the kudos! ;)