quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Does groups work with NSE? #33

Closed koheiw closed 3 years ago

koheiw commented 3 years ago

When I was updating the tutorial for v3.0, I found that NSE is not really working. This is from the test.

txt <- c("a a b b c d", "a d d d", "a a a")
grp1 <- c("one", "two", "one")
corp1 <- quanteda::corpus(txt, docvars = data.frame(grp2 = grp1))

# by a vector
textstat_frequency(quanteda::dfm(quanteda::tokens(corp1)), groups = grp1, ties_method = "max") # works (tested)

# by docvar
textstat_frequency(quanteda::dfm(quanteda::tokens(corp1)), groups = grp2, ties_method = "max") # error (not tested)
kbenoit commented 3 years ago

That’s right, it’s not yet implemented for quanteda.textstats. I will do this very soon. And for quanteda.textmodels for those using y.

koheiw commented 3 years ago

I thought you did it already becasue

grouping variable for sampling, equal in length to the number of documents. This will be evaluated in the docvars data.frame, so that docvars may be referred to by name without quoting. This also changes previous behaviours for groups. See news(Version >= "3.0", package = "quanteda") for details.

kbenoit commented 3 years ago

Ah right, that’s because of @inheritParams quanteda::group.

We have to resubmit the package anyway for v3, so I will fix that today.