statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
96 stars 37 forks source link

levels2 error in nodemix when levels are named #563

Open CarterButts opened 3 months ago

CarterButts commented 3 months ago

The levels2 argument to nodemix seems to be doing some odd things. Here's a cleaned up example from a case my students encountered in an exam:

Data file (sorry about the zip archive, github won't take Rdata files): example.zip

library(ergm)
load("example.Rdata")

mixingmatrix(fgf,"gender")                                       #Get the mixing matrix by gender
summary(fgf ~ nodemix("gender", levels2 = c(2:3)))               #This is correct
summary(fgf ~ nodemix("gender", levels2 = c("1.0","0.1")))       #This is not

summary(ergm(fgf~nodemix("gender", levels2 = c(2:3))))           #This one makes some sense
summary(ergm(fgf~nodemix("gender", levels2 = c("1.0","0.1"))))   #This one doesn't

summary(ergm(fgf~edges+nodemix("gender", levels2 = c(2:3))))           #Works as it should
summary(ergm(fgf~edges+nodemix("gender", levels2 = c("1.0","0.1"))))   #Odd collinearity

The mixing matrix we get is like so:

     To
From    0   1 Sum
  0    28   2  30
  1     7  66  73
  Sum  35  68 103

and this is consistent with the first use of levels2:

mix.gender.1.0 mix.gender.0.1 
             7              2 

but the second case gives us odd things:

mix.gender.0.1 mix.gender.1.0 
            73             30 

As one would expect, this spills over into ergm, to the point where you get odd stuff like an edges term being redundant with the mixing terms. It's puzzling that levels2 isn't just grabbing the wrong entry from the mixing matrix - it looks like it is taking the row sums of the matrix, instead. (Which shouldn't be possible for nodemix, but there it is.) That wouldn't perhaps be so bad if the results were then labeled as such, but they are labeled with the original arguments.

The levels2 argument is very powerful, but I confess that I myself get confused in some cases about how it is supposed to work! Still, this can't be intended behavior. I presume that there's a parsing step that's gone wrong somewhere....

mbojan commented 3 months ago

The documentation could be improved.... and perhaps supplemented with examples.

load("~/Downloads/example.Rdata")

(mm <- mixingmatrix(fgf, "gender"))
#>      To
#> From    0   1 Sum
#>   0    28   2  30
#>   1     7  66  73
#>   Sum  35  68 103

Numeric vector is interpreted as an index of the cells of the mixing matrix (in the usual column order):

summary(fgf ~ nodemix("gender", levels2=1)) # (1,1)
#> mix.gender.0.0 
#>             28
summary(fgf ~ nodemix("gender", levels2=3)) # (2,1)
#> mix.gender.0.1 
#>              2
summary(fgf ~ nodemix("gender", levels2=2:3)) # (1,2) and (2,1)
#> mix.gender.1.0 mix.gender.0.1 
#>              7              2

Character vector is interpreted as a matrix, it collapses/aggregates the cells assigned the same values:

summary(fgf ~ nodemix("gender", levels2=c("foo", "foo", "bar", "bar")))
#> mix.gender.bar mix.gender.foo 
#>             68             35
colSums(mm)
#>  0  1 
#> 35 68

because

matrix(c("foo", "foo", "bar", "bar"), 2, 2)
#>      [,1]  [,2] 
#> [1,] "foo" "bar"
#> [2,] "foo" "bar"

and this is interpreted as matrix(“foobar”, 2, 2) so sums the whole matrix:

summary(fgf ~ nodemix("gender", levels2="foobar"))
#> mix.gender.foobar 
#>               103
sum(mm)
#> [1] 103

I know nothing about parsing character vectors of the form “x.y”.

krivit commented 3 months ago

The documentation could, indeed, be much better. If you want to select level pairs explicitly, you need to pass something like

summary(fgf ~ nodemix("gender", levels2 = I(list(list(row=1,col=0),list(row=0,col=1)))))
mix.gender.1.0 mix.gender.0.1 
             7              2 

I'll try to reply in more detail later.

krivit commented 4 days ago

That'll do for the upcoming release, but we really should augment the vignette with diagrams, mixing matrices, etc..