Closed GoogleCodeExporter closed 9 years ago
Hi Jérôme,
no worries. If it's a bug in my TT4J, I want to know about it. Since I expect
TT4J to be used in educational contexts, it should also produce proper error
messages when something fails.
It looks like the whole thing starts with a NullPointerException in your
handler code at
fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrappe
r.java:223)
{{{
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:195)
at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: org.annolab.tt4j.TreeTaggerException: java.lang.NullPointerException
at org.annolab.tt4j.TreeTaggerWrapper.checkThreads(TreeTaggerWrapper.java:590)
at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:552)
at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:193)
... 14 more
Caused by: java.lang.NullPointerException
at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:223)
at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:1)
at org.annolab.tt4j.TreeTaggerWrapper$Reader.run(TreeTaggerWrapper.java:933)
at java.lang.Thread.run(Thread.java:679)
}}}
I have no idea what's at that line 223 since the committed version in your
subversion repository does not seem to be what you were using to produce these
exceptions (line numbers make no sense).
I suspect that you get you might not check if the lemma or postag arguments in
the Handler.token() can actually be null. That can happen e.g. if for some
reason an XML tag appears in a document. Having a TT4J trace mode log should
yield additional insight.
When there is an exception in the handler, that causes TT4J to shut down, set
the model to null and kill the background process. I should probably throw an
additional exception if somebody tries to invoke process() after such a forced
shutdown. I want to make sure that when a problem occurs, the developer/user is
made aware and that the processing stops hard, because TT4J does not log (in
order to keep dependencies minimal) and because in mass processing, such log
messages easily could get overlooked.
In the DKPro wrapper, we cannot have the problem that the model is set to null,
because we call setModel() every time we start processing a CAS. TT4J takes
care to switch the model only if the model name changes, so it's safe to call
that as often as you wish. So even if TT4J fails on one CAS, on the next CAS it
is reinitialized properly - however, we usually fail hard if there is an
exception in any annotator.
Original comment by richard.eckart
on 26 Oct 2011 at 6:31
Hi Jérôme,
have you been able to resolve this issue?
Original comment by richard.eckart
on 6 Nov 2011 at 11:26
Hi Richard,
I solve the problem which was mine! You were right: I had to set the wrapper
model as the process has been reset when exceptions were thrown by my token
handler.Thank you very much for your help.
Actually, I didn't need to set the model with the version 1.0.12 I used
before I faced the Chinese flush sequence issue. Exceptions were handled
without
resetting the process to null, am I right? But it wasn't the case for the next
versions. So I should have been more careful while reading the change log. I
miss this point.
I put a powered-by link to tt4j in the code project I maintain
(see http://code.google.com/p/ttc-project/). I really enjoy using tt4j.
Original comment by jerome.rocheteau
on 9 Nov 2011 at 8:48
Exceptions are resetting the process since a very long time. For the chinese
support I really only changed a single String in the code.
Thanks for the kudos! ;)
Original comment by richard.eckart
on 9 Nov 2011 at 8:50
Original issue reported on code.google.com by
jerome.rocheteau
on 26 Oct 2011 at 8:16Attachments: