quanteda / quanteda.textstats

Textual statistics for quanteda
GNU General Public License v3.0
14 stars 2 forks source link

Error in data frame : arguments imply differing number of rows. #40

Closed shmuhammadd closed 3 years ago

shmuhammadd commented 3 years ago

This is a solution on Stackoverflow provided by @kbenoit here. I try replicating the answer but it gives an error :

Error in data.frame(sim_pair_names, sim = as.numeric(sim), stringsAsFactors = FALSE) : arguments imply differing number of rows: 10, 25

`library("quanteda")

mydocs <- c(a1 = "a a a a a b b c d w g j t", b1 = "l y y h x x x x x y y y y", a2 = "a a a a a b c s k w i r f", b2 = "p q w e d x x x x y y y y", a3 = "a a a a a b b x k w i r f")

mydfm <- dfm(mydocs)

(sim <- textstat_simil(mydfm))

sim_pair_names <- t(combn(docnames(mydfm), 2)) sim_pairs <- data.frame(sim_pair_names, sim = as.numeric(sim), stringsAsFactors = FALSE)

Please, can you explain how to resolve the issue, Best, Shamsudden

kbenoit commented 3 years ago

You can replicate it by updating your version of quanteda, possibly.

If you want a pairwise data.frame, just use this:

> as.data.frame(sim)
   document1 document2 correlation
1         a1        b1 -0.22203788
2         a1        a2  0.80492203
3         b1        a2 -0.23090513
4         a1        b2 -0.23427416
5         b1        b2  0.90082239
6         a2        b2 -0.28140219
7         a1        a3  0.81167608
8         b1        a3 -0.09065452
9         a2        a3  0.92242890
10        b2        a3 -0.12530944
shmuhammadd commented 3 years ago

You can replicate it by updating your version of quanteda, possibly.

If you want a pairwise data.frame, just use this:

> as.data.frame(sim)
   document1 document2 correlation
1         a1        b1 -0.22203788
2         a1        a2  0.80492203
3         b1        a2 -0.23090513
4         a1        b2 -0.23427416
5         b1        b2  0.90082239
6         a2        b2 -0.28140219
7         a1        a3  0.81167608
8         b1        a3 -0.09065452
9         a2        a3  0.92242890
10        b2        a3 -0.12530944

Thank you very much. @kbenoit