Support Tokenizers - Githubissues

redpen-cc / redpen

RedPen is an open source proofreading tool to check if your technical documents meet the writing standard. RedPen supports various markup text formats (Markdown, Textile, AsciiDoc, Re:VIEW, reStructuredText and LaTeX).

https://redpen.cc

Apache License 2.0

563 stars 74 forks source link

Support Tokenizers #266

Closed takahi-i closed 9 years ago

takahi-i commented 9 years ago

Support tokenizers for various input languages. The tokenizer is run during generating document and then the results stored in Sentence objects.

Basically Lucene / Solr tokenizers can be applied.

takahi-i commented 9 years ago

I will try this issue. I will use the Kuromoji tokenizer for Japanese and plain white space tokenizer for english text.

takahi-i commented 9 years ago

Fix with #273.