Closed KyleHaynes closed 6 years ago
Hi,
Wondering if the below is expected behavior or I'm doing/interpreting something incorrectly?
When there are no matches in the tokens_lookup, shouldn't "NA" be returned (as opposed to "CA") in the below example?
# text txt <- c("12032 Musgrave rd red hill","13 rad street windermore park queensland","130 right road","130 rtn road") # tokenise txt toks <- quanteda::tokens(txt) # create named list dic <- list(CR=c("rd","red"), CB=c("street","feet"), CA=c("parl","dark")) # create dictionary dict <- quanteda::dictionary(dic) # apply tokens_lookup quanteda::tokens_lookup(toks, dict, levels=1, exclusive=T, nomatch="NA") tokens from 4 documents. text1 : [1] "CA" "CA" "CR" "CR" "CA" text2 : [1] "CA" "CA" "CB" "CA" "CA" "CA" text3 : [1] "CA" "CA" "CA" text4 : [1] "CA" "CA" "CA"
Currently using CRAN quanteda_1.2.0 with R3.5.0
Thanks Kyle
@KyleHaynes Thanks, its a bug. I have written a patch to fix this.
Patch seems to work. Thanks!
Hi,
Wondering if the below is expected behavior or I'm doing/interpreting something incorrectly?
When there are no matches in the tokens_lookup, shouldn't "NA" be returned (as opposed to "CA") in the below example?
Currently using CRAN quanteda_1.2.0 with R3.5.0
Thanks Kyle