koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.25k stars 117 forks source link

[FEATURE] OOVCountVectorizer #368

Closed koaning closed 4 years ago

koaning commented 4 years ago

Rasa adds an extra property to the countvectorizer for out of vocabulary words. This can be very useful in outlier detection of texts but scikit-learn does not support it.

Might be worth adding one here.

koaning commented 4 years ago

Out of scope.