While sorting out statnet/ergm#202, it turns out that even after updating ergm.ego's vertex attribute extraction and term defaults for consistency with ergm, COLLAPSE_SMALLEST() can still produce strange results if, in particular, frequencies of categories differ between egos and alters.
For example,
set.seed(0)
library(ergm.ego)
#> ergm: version 3.11.0-6010, created on 2021-01-30
#> ergm.ego: version 0.6.0-569, created on 2021-01-30
library(ergm)
library(magrittr)
n <- 100
e <- 150
ds <- c(10,15,5,20)
y <- network.initialize(n, directed=FALSE)
y %v% "a" <- sample(1:3+6,n,replace=TRUE)
aM <- matrix(FALSE, 3, 3)
aM[1,1] <- aM[1,3] <- TRUE
y %v% "b" <- sample(letters[1:4],n,replace=TRUE)
y %v% "c" <- sample(runif(10),n,replace=TRUE)
y %v% "d" <- runif(n)
y <- san(y~edges+degree(0:3), target.stats=c(e,ds))
y.e <- as.egodata(y)
f <- ~ nodefactor(COLLAPSE_SMALLEST("b",2, "x")) + mm(a~(~b) %>% COLLAPSE_SMALLEST(2,"x"), levels2=TRUE)
f.y <- statnet.common::nonsimp_update.formula(f, y~.)
environment(f.y) <- globalenv()
f.y.e <- statnet.common::nonsimp_update.formula(f, y.e~.)
environment(f.y.e) <- globalenv()
(f.y.s <- summary(f.y))
#> nodefactor.b.d nodefactor.b.x mm[a=7,b=a] mm[a=8,b=a] mm[a=9,b=a]
#> 67 163 20 25 25
#> mm[a=7,b=d] mm[a=8,b=d] mm[a=9,b=d] mm[a=7,b=x] mm[a=8,b=x]
#> 21 24 22 48 68
#> mm[a=9,b=x]
#> 47
(f.y.e.s <- summary(f.y.e))
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> nodefactor.b.b nodefactor.b.c nodefactor.b.d nodefactor.b.x mm[a=7,b=a]
#> 44.0 37.5 33.5 150.0 10.0
#> mm[a=8,b=a] mm[a=9,b=a] mm[a=7,b=b] mm[a=8,b=b] mm[a=9,b=b]
#> 12.5 12.5 12.0 19.5 12.5
#> mm[a=7,b=c] mm[a=8,b=c] mm[a=9,b=c] mm[a=7,b=d] mm[a=8,b=d]
#> 12.0 14.5 11.0 10.5 12.0
#> mm[a=9,b=d] mm[a=7,b=x] mm[a=8,b=x] mm[a=9,b=x]
#> 11.0 44.5 58.5 47.0
stopifnot(all.equal(f.y.s,f.y.e.s))
#> Error: f.y.s and f.y.e.s are not equal:
#> Names: 11 string mismatches
#> Numeric: lengths (11, 19) differ
While sorting out statnet/ergm#202, it turns out that even after updating
ergm.ego
's vertex attribute extraction and term defaults for consistency withergm
,COLLAPSE_SMALLEST()
can still produce strange results if, in particular, frequencies of categories differ between egos and alters.For example,
Created on 2021-01-30 by the reprex package (v1.0.0)
This happens because among the egos, factor "b" has one set of most frequent levels, whereas for the alters, it's another, and they get pooled.