rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
147 stars 11 forks source link

ENH Improve CountVectorizer performance #49

Closed rth closed 5 years ago

rth commented 5 years ago

This Improves CountVectorizer performance (making it around 30% faster) by avoiding casting tokens from &str to String while checking if a token is already in the vocabulary.