Closed koheiw closed 4 weeks ago
It is possible to keep unigrams in tokens_ngrams() by setting n = 1 but it is not in tokens_compound(). Keeping unigrams makes it easy to apply pattern matching in the downstream. How about adding an option like keep_unigram?
tokens_ngrams()
n = 1
tokens_compound()
keep_unigram
OK, good idea. keep_unigrams = FALSE (plural), or
keep_unigrams = FALSE
action = c("replace", "add")
but the action would only make sense if there were a possible third option, and I cannot think of any.
action
It is possible to keep unigrams in
tokens_ngrams()
by settingn = 1
but it is not intokens_compound()
. Keeping unigrams makes it easy to apply pattern matching in the downstream. How about adding an option likekeep_unigram
?