quanteda / quanteda

An R package for the Quantitative Analysis of Textual Data
https://quanteda.io
GNU General Public License v3.0
840 stars 188 forks source link

Keep original unigrams in tokens_compound() #2399

Closed koheiw closed 4 weeks ago

koheiw commented 1 month ago

It is possible to keep unigrams in tokens_ngrams() by setting n = 1 but it is not in tokens_compound(). Keeping unigrams makes it easy to apply pattern matching in the downstream. How about adding an option like keep_unigram?

kbenoit commented 1 month ago

OK, good idea. keep_unigrams = FALSE (plural), or

action = c("replace", "add")

but the action would only make sense if there were a possible third option, and I cannot think of any.