Open GoogleCodeExporter opened 8 years ago
If sentence boundary detection is the only goal and you want to simply train a
classifier for that then your training data should be a bunch of file(s) in
which the delimiter is preceeded by <S>. for ex. your train file could contain
the following text
The quick brown fox jumps over the lazy dog <S>. Mr. XYZ went to New York <S>.
Note that the period is preceeded by <S>. You need not separate each sentence
on a newline.
Original comment by rohitkel...@gmail.com
on 2 Jun 2013 at 4:23
Original issue reported on code.google.com by
mario.al...@gmail.com
on 22 Jun 2011 at 5:21