Open carlosparadis opened 7 years ago
The problem lies upstream on LDAvis
package itself. See the opened issue on the project.
The problem can be circumvented by defining another jsPCA
function which is the parameter mds.method
in the createJSON
:
jsPCA <- function(phi) {
# first, we compute a pairwise distance between topic distributions
# using a symmetric version of KL-divergence
# http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence
jensenShannon <- function(x, y) {
m <- 0.5*(x + y)
0.5*sum(x*log(x/m)) + 0.5*sum(y*log(y/m))
}
dist.mat <- proxy::dist(x = phi, method = jensenShannon)
# then, we reduce the K by K proximity matrix down to K by 2 using PCA
pca.fit <- stats::cmdscale(dist.mat, k = 2)
data.frame(x = pca.fit[,1], y = pca.fit[,2])
}
When executing createJSON, the following error will be thrown:
Error in stats::cmdscale(dist.mat, k = 2) : NA values not allowed in 'd'
I traced it down to:
Reproducible dataset
x <- c(0.2,0.3,0.3)
y <- c(0.2,0.3,0.4)
b <- c(0.2,0.3,0)
Using LDAvis
implementation shown at the start of this issue:
> jensenShannon(x=x,y=y)
[1] 0.003583677
> jensenShannon(x=x,y=b)
[1] NaN
The same test, using cosine
function from lsa
package:
> cosine(x=x,y=y)
[,1]
[1,] 0.9897595
> cosine(x=x,y=b)
[,1]
[1,] 0.7687061
For usage, plotLDAVis(models[["Jan"]],as.gist=FALSE) now allows a new parameter which is a variant of the default accepted by createJSON
:
plotLDAVis(models[["Jan"]],as.gist=FALSE,topicSimilarityMethod = CalculateTopicCosineSimilarity)
With the new parameter and passing the new function, it will use the cosine
function from package lsa
, which is also the one used to compare topics between different months.
The issue was fixed in the original code. Should test locally.
The following error is displayed and no visualization is generated:
Verified to occur in both old and new crawler, on year 2013, months Feb, Apr, Dec.