rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
148 stars 11 forks source link

Add tokenizer trait #48

Closed rth closed 5 years ago

rth commented 5 years ago

This makes it possible to use any object implementing the Tokenizer trait in Vectorizers,

    let tokenizer = VTextTokenizer::new("en");
    let vectorizer = CountVectorizer::new(&tokenizer);
rth commented 5 years ago

This also appears to need pyo3 0.7 (not yet released) to compile the python wrapper. Otherwise currently I get the following error,

error: #[pymethods] can not ve used with lifetime parameters or generics
  --> src/lib.rs:68:5
   |
68 | impl<'b> _HashingVectorizerWrapper<'b> {
   |     ^^^^

error: #[pymethods] can not ve used with lifetime parameters or generics
  --> src/lib.rs:93:5
   |
93 | impl<'b> _CountVectorizerWrapper<'b> {
   |     ^^^^

error[E0658]: The attribute `new` is currently unknown to the compiler and may have meaning added to it in the future (see issue #29642)
  --> src/lib.rs:69:7
   |
69 |     #[new]
   |       ^^^
   |
   = help: add #![feature(custom_attribute)] to the crate attributes to enable
rth commented 5 years ago

Merging. This currently only contains the Tokenizer trait, changes to vectorizers will be added in a follow-up PRs due to issues with Pyo3.