Closed erdnaxel closed 3 years ago
Hi @erdnaxel
In the original GibbsLDA++, topics of unseed documents are inferred in another round of Gibbs sampling. I haven't implemented this function, because I didn't think many people separate fitting and prediction steps with LDA.
With the current version, you can still predict topics of unseen documents using the distribution of topic over words (phi). Here, x
should be fitted LDA object, and newdata
is a DFM.
predict <- function(x, newdata = NULL) {
if (!is.null(x)) {
data <- newdata
} else {
data <- x$data
}
data <- dfm_match(data, colnames(x$phi))
temp <- data %*% t(x$phi)
result <- factor(max.col(temp), labels = rownames(x$phi),
levels = seq_len(nrow(x$phi)))
result[rowSums(data) == 0] <- NA
return(result)
}
Please be aware that the result of predict()
can be different from topics()
due to the different nature of algorithm.
Came here for the same question as @erdnaxel. I think implementing the predict function will be much appreciated.
Great work!
thank you, i really appreciate the response! i will try it out as soon as i can.
Guys, I created predict()
in the issue-9
branch. Please give it a try.
I close this as the branch is merged, so please open a new issue if there are problems.
hello:
love the package!!
i’m wondering how to apply the model to new data?