ropensci / tokenizers

Fast, Consistent Tokenization of Natural Language Text
https://docs.ropensci.org/tokenizers
Other
185 stars 25 forks source link

Fix 76 #77

Closed kbenoit closed 4 years ago

kbenoit commented 4 years ago

Changes stopword removal in tokenize_tweets() to happen before removing punctuation, so that stopwords that contain punctuation (such as "i'm") are handled correctly.

Fixes #76.