varnamproject / libvarnam

Deprecated. See https://github.com/varnamproject/govarnam
http://www.varnamproject.com
Mozilla Public License 2.0
100 stars 21 forks source link

Add ability to provide punctuation marks for each scheme #124

Closed navaneeth closed 9 years ago

navaneeth commented 9 years ago

Currently there is no way to specify what full stop, half stop means for each scheme. Client libraries treat, .,; as word breaker characters for all the languages. This won't work well for Hindi because in Hindi, DEVANAGARI DANDA is used as full stop. DEVANAGARI DOUBLE DANDA is used for semicolon.

Proposal:

punctuation_marks ["।"]

When learning a word, all punctuation marks should be removed.

121 and PrathamBooks/spp#472 depends on this issue.