Configurable punctuation and sentence boundary detection

Dear Richard, 

I have made a small modification to the StanfordSegmenter that allows one to 
specify extra punctuation characters that the Stanford Word Segmenter should 
add to the sentence boundary detection pattern (my corpus uses em-dashes to 
mark sentence boundaries).

Maybe this is of some use to someone else? 

Unfortunately, I have no experience contributing to google-code projects or 
working with svn so I have no idea if there is more efficient way of sharing 
this. I'm sending the diff attached below.

Best,
jta

Source: https://groups.google.com/d/topic/dkpro-core-user/LtPbnUOK5P4/discussion

---

Functionality will be implemented - patch will not be applied directly.

Original issue reported on code.google.com by richard.eckart on 27 Jan 2015 at 11:05

vnadgir / dkpro-core-asl

Configurable punctuation and sentence boundary detection #584