opencog / relex

English Dependency Relationship Extractor
http://wiki.opencog.org/w/RelEx
Apache License 2.0
85 stars 68 forks source link

Remove OpenNLP dependency. #247

Closed linas closed 8 years ago

linas commented 8 years ago

The OpenNLP toolkit is used for only one situation: to perform sentence-splitting (detecting the boundaries of sentences). It is slightly more accurate than the default java.text.BreakIterator, but perhaps not enough of an improvement to be worth the extra effort?

Its a hassle to install OpenNLP, and since the usage is so marginal, I'm thinking its just not worth the effort.

linas commented 8 years ago

The default Java class is claimed to work correctly: e.g. https://docs.oracle.com/javase/7/docs/api/java/text/BreakIterator.html says

Sentence boundary analysis allows selection with correct interpretation of periods within numbers and abbreviations, and trailing punctuation marks such as quotation marks and parentheses.

linas commented 8 years ago

Never mind. Oracle is lying about java. The sentence "Dr. Smith is late." is split into two sentences by java; it thinks that "Dr." is a valid sentence.