tema16 / tt4j

Automatically exported from code.google.com/p/tt4j
0 stars 0 forks source link

Null Model #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Helle Richard,

Sorry to bother you once again. 

I now use tt4j 1.0.16. It works fine with my "less-than-10-document" test 
cases. 
However, it systematically raises the exception below after 
a certain amount of documents. This amount is variable - around 30 document 
processed - but it could depend on the document sizes. 

The exception points out that the tagger process is null as well as the model 
(then it fails at _model.install()) when the tagger process is called. 

Do you known why this exception has been thrown. 
Should I set a parameter I didn't set before to the wrapper?

Thanks in advance,
Jérôme

Caused by: java.lang.NullPointerException
    at org.annolab.tt4j.TreeTaggerWrapper.getTaggerProcess(TreeTaggerWrapper.java:671)
    at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:518)

Original issue reported on code.google.com by jerome.rocheteau on 26 Oct 2011 at 8:16

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Jérôme,

no worries. If it's a bug in my TT4J, I want to know about it. Since I expect 
TT4J to be used in educational contexts, it should also produce proper error 
messages when something fails.

It looks like the whole thing starts with a NullPointerException in your 
handler code at 
fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrappe
r.java:223)

{{{
org.apache.uima.analysis_engine.AnalysisEngineProcessException
    at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:195)
    at org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
    at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
    at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
    at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897)
    at org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577)
Caused by: org.annolab.tt4j.TreeTaggerException: java.lang.NullPointerException
    at org.annolab.tt4j.TreeTaggerWrapper.checkThreads(TreeTaggerWrapper.java:590)
    at org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:552)
    at fr.univnantes.lina.uima.engines.TreeTaggerWrapper.process(TreeTaggerWrapper.java:193)
    ... 14 more
Caused by: java.lang.NullPointerException
    at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:223)
    at fr.univnantes.lina.uima.engines.TreeTaggerWrapper$Handler.token(TreeTaggerWrapper.java:1)
    at org.annolab.tt4j.TreeTaggerWrapper$Reader.run(TreeTaggerWrapper.java:933)
    at java.lang.Thread.run(Thread.java:679)
}}}

I have no idea what's at that line 223 since the committed version in your 
subversion repository does not seem to be what you were using to produce these 
exceptions (line numbers make no sense). 
I suspect that you get you might not check if the lemma or postag arguments in 
the Handler.token() can actually be null. That can happen e.g. if for some 
reason an XML tag appears in a document. Having a TT4J trace mode log should 
yield additional insight.

When there is an exception in the handler, that causes TT4J to shut down, set 
the model to null and kill the background process. I should probably throw an 
additional exception if somebody tries to invoke process() after such a forced 
shutdown. I want to make sure that when a problem occurs, the developer/user is 
made aware and that the processing stops hard, because TT4J does not log (in 
order to keep dependencies minimal) and because in mass processing, such log 
messages easily could get overlooked.

In the DKPro wrapper, we cannot have the problem that the model is set to null, 
because we call setModel() every time we start processing a CAS. TT4J takes 
care to switch the model only if the model name changes, so it's safe to call 
that as often as you wish. So even if TT4J fails on one CAS, on the next CAS it 
is reinitialized properly - however, we usually fail hard if there is an 
exception in any annotator.

Original comment by richard.eckart on 26 Oct 2011 at 6:31

GoogleCodeExporter commented 9 years ago
Hi Jérôme,

have you been able to resolve this issue?

Original comment by richard.eckart on 6 Nov 2011 at 11:26

GoogleCodeExporter commented 9 years ago
Hi Richard,

I solve the problem which was mine! You were right: I had to set the wrapper 
model as the process has been reset when exceptions were thrown by my token 
handler.Thank you very much for your help.

Actually, I didn't need to set the model with the version 1.0.12 I used 
before I faced the Chinese flush sequence issue. Exceptions were handled 
without 
resetting the process to null, am I right? But it wasn't  the case for the next 
versions. So I should have been more careful while reading the change log. I 
miss this point. 

I put a powered-by link to tt4j in the code project I maintain 
(see http://code.google.com/p/ttc-project/). I really enjoy using tt4j.

Original comment by jerome.rocheteau on 9 Nov 2011 at 8:48

GoogleCodeExporter commented 9 years ago
Exceptions are resetting the process since a very long time. For the chinese 
support I really only changed a single String in the code.

Thanks for the kudos! ;)

Original comment by richard.eckart on 9 Nov 2011 at 8:50