rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
147 stars 11 forks source link

Better support of configuration parameters in vectorizers #50

Closed rth closed 4 years ago

rth commented 5 years ago

Currently CountVectorizer and HashingVectorizer mostly perform BOW token counting without the possibility to change the tokenizer or any other parameters.

While we intentionally won't support all the parameter that scikit-learn versions does (as these meta-estimators are doing too much), additional parametrization would be preferable.

rth commented 5 years ago

Also, different possibilities of defining default and optional parameters (for vectorizers and other estimators in general) is discussed in https://github.com/rust-ml/discussion/issues/2

rth commented 4 years ago

Resolved in https://github.com/rth/vtext/pull/57