Closed syumet closed 4 years ago
Fixed in a PR (still pending), thanks.
The quanteda package has a much upgraded default tokenizer in v2 by the way that handled social media tags even better and faster than tokenize_tweets()
, without the problems you noticed.
Thanks for the fix, @kbenoit.
@syumet: You should be able to install the development version with the fix via the remotes
package.
Consider this example:
From my observation,
tokenize_tweets()
will remove punctuations before cleaning stopwords, that's probably the cause of the problem.