Closed shmuhammadd closed 3 years ago
You can replicate it by updating your version of quanteda, possibly.
If you want a pairwise data.frame, just use this:
> as.data.frame(sim)
document1 document2 correlation
1 a1 b1 -0.22203788
2 a1 a2 0.80492203
3 b1 a2 -0.23090513
4 a1 b2 -0.23427416
5 b1 b2 0.90082239
6 a2 b2 -0.28140219
7 a1 a3 0.81167608
8 b1 a3 -0.09065452
9 a2 a3 0.92242890
10 b2 a3 -0.12530944
You can replicate it by updating your version of quanteda, possibly.
If you want a pairwise data.frame, just use this:
> as.data.frame(sim) document1 document2 correlation 1 a1 b1 -0.22203788 2 a1 a2 0.80492203 3 b1 a2 -0.23090513 4 a1 b2 -0.23427416 5 b1 b2 0.90082239 6 a2 b2 -0.28140219 7 a1 a3 0.81167608 8 b1 a3 -0.09065452 9 a2 a3 0.92242890 10 b2 a3 -0.12530944
Thank you very much. @kbenoit
This is a solution on Stackoverflow provided by @kbenoit here. I try replicating the answer but it gives an error :
Error in data.frame(sim_pair_names, sim = as.numeric(sim), stringsAsFactors = FALSE) : arguments imply differing number of rows: 10, 25
`library("quanteda")
mydocs <- c(a1 = "a a a a a b b c d w g j t", b1 = "l y y h x x x x x y y y y", a2 = "a a a a a b c s k w i r f", b2 = "p q w e d x x x x y y y y", a3 = "a a a a a b b x k w i r f")
mydfm <- dfm(mydocs)
(sim <- textstat_simil(mydfm))
sim_pair_names <- t(combn(docnames(mydfm), 2)) sim_pairs <- data.frame(sim_pair_names, sim = as.numeric(sim), stringsAsFactors = FALSE)
Please, can you explain how to resolve the issue, Best, Shamsudden