Closed reckart closed 9 years ago
Comment #1 originally posted by reckart on 2014-11-25T11:30:55.000Z:
<empty>
Comment #2 originally posted by reckart on 2014-11-25T11:32:56.000Z:
I'd be surprised if the GAE allowed you to run native binaries. Are you sure this is allowed?
Comment #3 originally posted by reckart on 2014-11-25T11:48:21.000Z:
When I try the example from command line (inside GCE) $ echo 'Hello world!' | cmd/tree-tagger-english-utf8 (see: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) it works.
Comment #4 originally posted by reckart on 2014-11-25T11:50:51.000Z:
Can you reproduce the original problem or did you just copy the stackoverflow report here?
If you can reproduce it, what version of TT4J are you using?
Comment #5 originally posted by reckart on 2014-11-25T11:56:21.000Z:
I'm the author of the stackoverflow report =) 1.2.0
Comment #6 originally posted by reckart on 2014-11-25T12:03:58.000Z:
I see :) Great!
The problematic line in 1.2.0 is this one:
boolean isUnicode = "UTF-8".equals(_model.getEncoding().toUpperCase(Locale.US));
I can see potential for a NPE here, but I wonder why it works locally but not on the GCE.
Do you provide an encoding for the model? Does GCE have a problem with Locale.US?
Are you using tt4j directly or within another framework, e.g. in DKPro Core? If you are using it directly, you might want to give version 1.2.1 a try which offers a way of setting a model without using a model resolver.
Comment #7 originally posted by reckart on 2014-11-25T12:28:07.000Z:
My local machine is Win8, the GCE has Debian (so I use different treetagger packages). My setup is nearly the same as you provide in your example:
System.setProperty("treetagger.home", "/home/spark/resources/treetagger");
try {
//tt.setModel("c:/treetagger/lib/german-utf8.par"); //local
tt.setModel("/home/spark/resources/treetagger/lib/german-utf8.par"); //gce
tt.setPerformanceMode(true);
tt.setHandler(new TokenHandler
So I use it directly. I just tried v1.2.1 (but with no changes in my source code) it produces the same error - Should I change my setup? How?
Comment #8 originally posted by reckart on 2014-11-25T13:07:00.000Z:
When loading a model, you should specify an encoding. This can be done in two ways:
1)
treetagger.setModel(modelFile.getPath() + ":" + encoding);
2) (works only with 1.2.1+)
DefaultModel model = new DefaultModel( modelFile.getPath() + ":" + encoding, modelFile, encoding, DefaultModel.DEFAULT_FLUSH_SEQUENCE); treetagger.setModel(model);
Comment #9 originally posted by reckart on 2014-11-25T14:47:12.000Z:
I just found sth. out what I should have tested much earlier: When I run my app on my local machine, I set an option to run it only on this one local machine. When I run it in gce, I set an option for a "parallel run", means, the task will be committed to multiple worker-instances, so that it can processed parallel. Now I set the option for "local run" in gce - and it succeeded!
Comment #10 originally posted by reckart on 2014-11-25T15:54:36.000Z:
Ok, sounds this issue can be closed then :)
Original issue 20 created by reckart on 2014-11-25T11:24:59.000Z:
What steps will reproduce the problem?
Output: Exception in thread "main" org.apache.spark.SparkException: Job aborted due to s tage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 8, spark-worker-1a.c.****.internal): java.lang.Null PointerException: org.annolab.tt4j.TreeTaggerWrapper.removeProblematicTokens(TreeTaggerWra pper.java:684) org.annolab.tt4j.TreeTaggerWrapper.process(TreeTaggerWrapper.java:557)
When I run my App on my local machine, it works fine. I only get this error message when I run it on a google compute engine using apche spark. I already enabled the performance mode -> treetagger.setPerformanceMode(true) but still get the same error message.
Duplicate: http://stackoverflow.com/questions/27123826/treetaggerwrapper-fails-in-google-compute-engine-with-apache-spark?noredirect=1