rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
146 stars 11 forks source link

Add unicode tokenizer #17

Closed rth closed 5 years ago

rth commented 5 years ago

Add Unicode tokenizer according to the Unicode® Standard Annex #29

This mostly provides a thin wrapper around the unicode-segmentation crate.