Closed takahi-i closed 9 years ago
Support tokenizers for various input languages. The tokenizer is run during generating document and then the results stored in Sentence objects.
Basically Lucene / Solr tokenizers can be applied.
I will try this issue. I will use the Kuromoji tokenizer for Japanese and plain white space tokenizer for english text.
Fix with #273.
Support tokenizers for various input languages. The tokenizer is run during generating document and then the results stored in Sentence objects.
Basically Lucene / Solr tokenizers can be applied.