Original issue 3 created by reckart on 2011-05-05T08:20:11.000Z:
It seems that TT has problems with very long tokens. Consider writing a test checking what the maximum token size is and subsequently add code to tt4j that ignores such long tokens.
Comment #1 originally posted by reckart on 2011-06-03T21:46:11.000Z:
Improved handling of a dying TreeTagger process.
Added setting to control the maximum token length (in bytes) - per default 90000.
Empirically determined that at least on my machine the maximum token length is 99998. I expect that there is a 100000 byte buffer in TreeTagger- this corresponds to 99998 one-byte characters + line-break + ZERO (end of string in C).
Original issue 3 created by reckart on 2011-05-05T08:20:11.000Z:
It seems that TT has problems with very long tokens. Consider writing a test checking what the maximum token size is and subsequently add code to tt4j that ignores such long tokens.