Closed rth closed 5 years ago
This adds a Python wrapper for the rust_stemmers crate for the Snowball algorithm.
The results is around 15-20x faster than NLTK,
$ python3.7 ../benchmarks/bench_stemmers.py # stemming 1000 documents nltk.stem.PorterStemmer(): 7.18s [0.05 M tokens/s] nltk.stem.SnowballStemmer('english'): 5.31s [0.07 M tokens/s] nltk.stem.SnowballStemmer('french'): 10.68s [0.04 M tokens/s] pytext_vectorize.stem.SnowballStemmer('english'): 0.37s [1.05 M tokens/s] pytext_vectorize.stem.SnowballStemmer('french'): 0.48s [0.82 M tokens/s]
This adds a Python wrapper for the rust_stemmers crate for the Snowball algorithm.
The results is around 15-20x faster than NLTK,