Closed trinker closed 10 years ago
Not the case as this produces the same error:
z <- as.Corpus(dtm)
meta(z, "labels") <- names(meta(z, "labels"))
as.DocumentTermMatrix(z)
This does which is what the idea is based on:
library(tm)
data("crude")
dtm <- DocumentTermMatrix(crude,
control = list(weighting =
function(x)
weightTfIdf(x, normalize = FALSE),
stopwords = TRUE))
## Convert tdm to a list of text
dtm2list <- apply(dtm, 1, function(x) {
paste(rep(names(x), x), collapse=" ")
})
## convert to a Corpus
myCorp <- VCorpus(VectorSource(dtm2list))
inspect(myCorp)
## Stemming
DocumentTermMatrix(myCorp)
The problem is actually that there is no Corpus
method for as.DocumentTermMatrix
/as.dtm
and as.TermDocumentMatrix
/as.tdm
. So the following fails as well (using prior example):
as.DocumentTermMatrix(myCorp)
with the same error message:
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), :
dims [product 20] do not match the length of object [3]
However using just DocumentTermMatrix(myCorp)
worked. So there was no method for Corpus
to convert to the 2 term matrix forms. So as.dtm
was using as.dtm.default
:
as.dtm.default <-
function(text.var, grouping.var = NULL, vowel.check = TRUE, ...) {
tm::as.DocumentTermMatrix(x = text.var, ...)
}
And since tm
has no coercion for Corpus
using as.DocumentTermMatrix
the error happened:
> methods(as.DocumentTermMatrix)
[1] as.DocumentTermMatrix.default*
[2] as.DocumentTermMatrix.DocumentTermMatrix*
[3] as.DocumentTermMatrix.term_frequency*
[4] as.DocumentTermMatrix.TermDocumentMatrix*
[5] as.DocumentTermMatrix.textcnt*
Non-visible functions are asterisked
So the fix is to make a as.tdm.Corpus
and as.dtm.Corpus
method as follows:
as.tdm.Corpus <-
function(text.var, grouping.var = NULL, vowel.check = TRUE, ...) {
tm::TermDocumentMatrix(x = text.var, ...)
}
as.dtm.Corpus <-
function(text.var, grouping.var = NULL, vowel.check = TRUE, ...) {
tm::DocumentTermMatrix(x = text.var, ...)
}
I suspect this is do to no providing meta labels in
as.Corpus
that must be fixed.