TokenBuilder replaces every \r\n with \\n instead of \n in the input text.
This is a bug. Also, I think it would be nice if there were multiple newlines
in a row that an empty sentence is not created. So, I am going to replace the
line:
tokensString = tokensString.replaceAll("\r\n", "\\n");
with:
tokensString = tokensString.replaceAll("\\s*\n\\s*", "\n");
this will fix the bug and produce better sentences.
Original issue reported on code.google.com by pvogren@gmail.com on 27 Dec 2010 at 7:38
Original issue reported on code.google.com by
pvogren@gmail.com
on 27 Dec 2010 at 7:38