rth / vtext

Simple NLP in Rust with Python bindings
Apache License 2.0
146 stars 11 forks source link

Optimize summing of duplicate tokens #12

Closed rth closed 5 years ago

rth commented 5 years ago

Results in ~20% performance improvements for HashingVectorizer (and smaller relative improvement for CountVectorizer).