Open ko-ichi-h opened 3 years ago
Since you're getting results with some datasets/metrics and not others, I suspect you may have NAs, NANs, NULL, or other non-numeric values in your data that are causing this type of error. If you confirm the data aren't the issue, it would be helpful if you could post the traceback to pinpoint the error.
Just a note: if memory serves correctly, the original author wrote this package as a grad school project. I took over as the maintainer while working towards my own graduate degree. I'm out of school now so it's been a while since I've actively worked on the project (hence the delayed response), and there isn't any active development going on. If you're interested in contributing to the project, I'm happy to add you to the repo.
Thanks!
Hello and thank you for your reply.
I believe the data is not the issue because (1) only "Arun2010" gives me the error while other metrics return results, and (2) for some "topics" settings, "Arun2010" also gives me the result normally. The following command gives me the error but if I delete ", 80" from the "topics" option, it gives me the result normally.
result_tps <- FindTopicsNumber(
dtm,
topics = c(seq(2, 35, by=3), 40, 45, 50, 60, 70, 80),
metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010" , "Deveaud2014"),
method = "Gibbs",
control = list(seed = 1234567, burnin = 1000),
verbose = T
)
Anyway, here is the traceback() result:
9: FUN(X[[i]], ...)
8: lapply(X = X, FUN = FUN, ...)
7: sapply(models, FUN = function(model) {
m1 <- exp(model@beta)
m1.svd <- svd(m1)
cm1 <- as.matrix(m1.svd$d)
m2 <- model@gamma
cm2 <- len %*% m2
norm <- norm(as.matrix(len), type = "m")
cm2 <- as.vector(cm2/norm)
divergence <- sum(cm1 * log(cm1/cm2)) + sum(cm2 * log(cm2/cm1))
return(divergence)
})
6: Arun2010(models, dtm)
5: FindTopicsNumber(dtm, topics = c(seq(2, 35, by = 3), 40, 45,
50, 60, 70, 80), metrics = c("Griffiths2004", "CaoJuan2009",
"Arun2010", "Deveaud2014"), method = "Gibbs", control = list(seed = 1234567,
burnin = 1000), verbose = T) at ldatuning_error.r#1230
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("C:\\Users\\KO-ichi\\Desktop\\ldatuning_error.r")
Any help would be highly appreciated.
Thank you.
What are the dimensions of your input dtm
? It looks like the number of columns might be 71, which would correspond to the number of terms. Perhaps you can't generate a larger number of topics than you have terms using the Arun method.
If that's the case, there should be a check to confirm that the number of topics specified in FindTopicsNumber
doesn't exceed the number of terms in the dtm
.
What are the dimensions of your input dtm? It looks like the number of columns might be 71, which would correspond to the number of terms. Perhaps you can't generate a larger number of topics than you have terms using the Arun method.
Yes, you are absolutely right. The column number is 71 and svd() outputs only 71 singular values. It causes the error.
And yes again, that number check should be performed and more human readable error message would be nice.
Ok, glad we were able to identify the issue. I tagged this as something that needs work.
I question whether it ever makes sense to have more topics than terms. My suggestion would be for the check to throw an error if topics > terms, regardless of which algorithm is selected, unless someone can give a good example of why you'd want to have more topics than terms.
The error should occur before actual processing begins -- it wouldn't be fun for your processing to run for a few days only to get an error at the end.
Hmm, it may be possible that term A forms topic Alpha, term B forms topic Beta, and term A & B together form topic Gamma. 2 terms and 3 topics may be possible I think.
So it would be fine to raise an error only when users specify "Arun2010".
Hello,
Thank you for developing such a useful software!
When I run FindTopicsNumber(), I can get results normally for some data, but I get the following error for some data.
And here is the R script file that gave me the above error: ldatuning_error.zip
If I exclude "Arun2010" from "metrics" option, I get results normally without any errors.
My sessionInfo():
I also get the same error with R 3.x.
Best.