Closed GoogleCodeExporter closed 9 years ago
I have committed org.cleartk.token.breakit.BreakIteratorAnnotator which works
for the word break iterator and sentence break iterator for a user specified
locale and annotation type. I was slightly lazy and didn't mess with the
indexes the sentence break iterator produces. It splits the text up such that
none of the provided text is not in a sentence - e.g. trailing white space is
included in the preceding sentence.
I didn't bother with the Character or Line break iterators. If you need
support for these, then submit a new issue.
The code is committed to the CleartkProjectReOrg branch in to the cleartk-token
project and will appear in trunk after this branch is merged.
Original comment by pvogren@gmail.com
on 29 Dec 2010 at 12:13
Original issue reported on code.google.com by
pvogren@gmail.com
on 28 Dec 2010 at 9:44