Open simonmeoni opened 8 years ago
TT4J wraps the actual text in tags like "
The problem is due to this two variables :
private static final String STARTOFTEXT = "<This-is-the-start-of-the-text />";
private static final String ENDOFTEXT = "<This-is-the-end-of-the-text />"
TreeTagger needs to ignore this sgml tag to works correctly with the wrapper. It is possible to don't send this two String ? I think the problem come from (line 1120 of TreeTaggerWrapper.class):
void run()
{
try {
final OutputStream os = _proc.getOutputStream();
_pw = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(os, _model.getEncoding())));
send(STARTOFTEXT);
while (tokenIterator.hasNext()) {
O token = tokenIterator.next();
_lastTokenWritten = token;
_tokensWritten++;
send(getText(token));
}
send(ENDOFTEXT);
send(_model.getFlushSequence());
}
catch (final Throwable e) {
_exception = e;
}
}
Thanks in advance, Simon
I have found the solution. I replace the line 969 by this on the TreeTaggerWrapper.class :
if (outRecord.contains(STARTOFTEXT)) {
inText = true;
if (TRACE) {
System.err.println("["+TreeTaggerWrapper.this+
"|TRACE] ("+_tokensRead+") START ["+outRecord+"]");
}
continue;
}
if (outRecord.contains(ENDOFTEXT)) {
if (TRACE) {
System.err.println("["+TreeTaggerWrapper.this+
"|TRACE] ("+_tokensRead+") COMPLETE ["+outRecord+"]");
}
break;
}
and it's working when I don't have the -sgml option :).
Thanks for testing this. I'll implement a different solution though that doesn't change existing behavior. What I will do is: check if the "-sgml" flag is present (the default). If the flag is present, continue with the present code. If the flag is not present, try checking specifically if the token text is the start/end marker, probably using "startsWith" instead of "contains".
@Alpha34587 could you please check if the changes I made work for you as well?
Yes the change sounds good for me :) Thanks !
I have a problem when I execute this code, I have just delete the sgml args and it's only this argument that it cause problem when it is not present . The processus never terminates his execution and the program enter on a infinite loop when it execute the function process. The infinite loop is on the line 591 on TreeTaggerWrapper.class file. I try to debug them but no sucess ... Do you have any idea where is the problem ? Thanks in advance, Simon