Closed scottdallman closed 4 years ago
Reproducible example:
library("quanteda.textmodels")
## Loading required package: quanteda
## Package version: 2.0.0
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
##
## Attaching package: 'quanteda'
## The following object is masked from 'package:utils':
##
## View
txt <- c(
d1 = "Chinese Beijing Chinese",
d2 = "Chinese Chinese Shanghai",
d3 = "Chinese Macao",
d4 = "Tokyo Japan Chinese",
d5 = "Chinese Chinese Chinese Tokyo Japan"
)
trset <- dfm(txt, tolower = FALSE)
trclass <- factor(c("Y", "Y", "Y", "N", NA), ordered = TRUE)
tmod1 <-
textmodel_nb(trset, y = trclass, prior = "docfreq")
tmod2 <-
textmodel_nb(dfm_tfidf(trset), y = trclass, prior = "docfreq")
## Error: will not group a weighted dfm; use force = TRUE to override
Thank you for quickly looking into this. Could you please provide a little more detail regarding your comment on applying the dfm_tfidf() for weighting prior to fitting the Naive Bayes classifier within Quanteda.
I'm still a little confused what weights are initially being applied in the dfm() function prior to the dfm_tfidf() call that dfm_tdidf() is applying an additional weighting method to - are these just the term frequency weights? (example: https://quanteda.io/reference/dfm_tfidf.html)
If its questionable to weight by tf-idf prior to fitting the Naive Bayes, could you provide a minimal work example of how one would estimate the Naive Bayes by using the dfm_tfidf() function?
Describe the bug
Attempting to use dfm with tfidf weighting scheme
dfm_tfidf()
withintextmodel_nb()
but receive the following error: `Error: will not group a weighted dfm; use force = TRUE to override'Reproducible code
Please paste minimal code that reproduces the bug. If possible, please upload the data file as
.rds
.Expected behavior
Would like textmodel_nb() to accept dfm_tfidf() object and return
System information
Please run
sessionInfo()
and paste the output.Additional info