Open berndmoos opened 2 years ago
Is there anything I can do in the configuration to fix this? I am using OpenJDK 14. Any help would be much appreciated, the feature relying on TT4J is used a lot in EXMARaLDA, and we can only refer people to the older TreeTagger version at the moment.
What happens if you enter the command that TT4J generates directly on the command line?
Have you asked the TreeTagger maintainer about changes in the way that TT handles stdin, stdout, and stderr?
What happens if you enter the command that TT4J generates directly on the command line?
With the command only, it hangs, too, because there is no input. When I pass it a text file as input, I get:
C:\Users\thomas.schmidt>D:\Dropbox\TreeTagger\bin\tree-tagger.exe -token -lemma -sgml -no-unknown -prob -threshold 0.999999000000 C:\TreeTagger\lib\italian.par C:\Users\thomas.schmidt\Desktop\123.txt
reading parameters ...
tagging ...
une NOM une 0.736251
due ADJ due 0.999291
tre ADJ tre 0.981432
finished.
Have you asked the TreeTagger maintainer about changes in the way that TT handles stdin, stdout, and stderr?
No, but I will now ;-)
When you try on the command line, try adding the <This-is-the-start-of-the-text />
and <This-is-the-end-of-the-text />
pseudo-elements and after the end-pseudo-element try entering something like This is a sentence .
one token per line and in the language of the model - do not close the stdin stream - does it generate output?
TT may only generate output when the input stream is closed or after it has seen a certain amount of data. TT4J uses a model-specific "flush sequence" to convince TT to generate its output. Maybe something in relation to that has changed and TT4J is no longer able to convince TT to start generating.
You could also experiment in the code with a longer flush sequence.
You could try to add some printf debugging into the Reader
and Writer
inner classes in TreeTaggerWrapper
to see in more detail how much data and which data TT4J sends to / receives from TT.
Thanks for the explanation.
Not sure I get the instructions right, but when I prepend the pseudo tags inside the input text file, I still get the expected output on the command line. Using a much longer input array in the above Java code doesn't change anything. Let's see if the TT developer has a hint...
@berndmoos Have you been able to fix your situation?
@berndmoos Have you been able to fix your situation?
Haven't found the time yet, but will ... sometime ... soon ... hopefully before next year.
Just a (non-)update:
I'm afraid I will have to give up on this. I can provide Windows users with an older version of TreeTagger. For MAC users, no older versions seem to be available for download. :-(
For the record: HS suggests to use a longer flush sequence. Right now, the flush sequence is defined here:
public static final String DEFAULT_FLUSH_SEQUENCE = "\n.\n.\n.\n.\n.\n(\n)\n.\n.\n.\n.\n";
Will experiment with this if and when I find the time.
Found the culprit at last:
tt.setProbabilityThreshold(0.999999);
If I do not set the probability threshold, TreeTagger will no longer hang. Setting the probability threshold to a different value (tried 0.9 / 1.0 / 0.5) makes no difference.
Interesting. The question is if it is a TT4J but or a treetagger bug... any hunch?
TreeTagger hanging in contect of probability is also mentioned in #13. Otherwise, no hunch. The reason I am setting the probability threshold is that I want to process probabilities as described in #13. That, however, is not central to most applications I have. For the time being, will remove those lines and use probability threshold only in contexts where I need them and can be sure that the older version of TreeTagger is used.
Wrt. probabilities parameter: Might be a detail that could be communicated to HS - maybe it sparks an idea why could be the root problem.
With otherwise identical code, my tagging process will hang when used with the latest version of TreeTagger (this is on Windows, but there are hints that the problem occurs on Mac OS, too). This is irrespective of the parameter file and the input used.
Here is a test case:
With a newer version of TreeTagger, this will give me:
... and then it hangs. With an older version of TreeTagger, I get: