rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.43k stars 903 forks source link

[FEA] NLTK Porter Stemmer #3108

Closed VibhuJawa closed 4 years ago

VibhuJawa commented 5 years ago

As a user i would like a porter stemmer in nvstrings as a important nlp pre-processing step.

Based a initial reading of the algorithm,

I feel implementing the measure function at a C++ level can be a good start as we may be able to implement the other functions on a python level.

Link to stemming logic: https://www.nltk.org/_modules/nltk/stem/porter.html

davidwendt commented 5 years ago

We are assuming that the nvstrings instance is individual words to measure:

VibhuJawa commented 4 years ago

This can be closed as measure was implemented in nvstrings.