Dear Richard,
I have made a small modification to the StanfordSegmenter that allows one to
specify extra punctuation characters that the Stanford Word Segmenter should
add to the sentence boundary detection pattern (my corpus uses em-dashes to
mark sentence boundaries).
Maybe this is of some use to someone else?
Unfortunately, I have no experience contributing to google-code projects or
working with svn so I have no idea if there is more efficient way of sharing
this. I'm sending the diff attached below.
Best,
jta
Source: https://groups.google.com/d/topic/dkpro-core-user/LtPbnUOK5P4/discussion
---
Functionality will be implemented - patch will not be applied directly.
Original issue reported on code.google.com by richard.eckart on 27 Jan 2015 at 11:05
Original issue reported on code.google.com by
richard.eckart
on 27 Jan 2015 at 11:05