Closed lmullen closed 7 years ago
Some tokenizers have options to strip or retain punctuation and numbers. These options should also be present in the word and n-gram tokenizers.
tokenize_words()
tokenize_ngrams()
tokenize_skip_ngrams()
The punct-options branch now has options for preserving punctuation in tokenize_words().
punct-options
@kbenoit Does this do what you expect it to do?
Not going to do this for n-grams. Works for word tokenizer.
Some tokenizers have options to strip or retain punctuation and numbers. These options should also be present in the word and n-gram tokenizers.
tokenize_words()
tokenize_ngrams()
tokenize_skip_ngrams()