Figure out how to get COLLAPSE_SMALLEST() to work consistently for egocentric data.

While sorting out statnet/ergm#202, it turns out that even after updating ergm.ego's vertex attribute extraction and term defaults for consistency with ergm, COLLAPSE_SMALLEST() can still produce strange results if, in particular, frequencies of categories differ between egos and alters.

For example,

set.seed(0)
library(ergm.ego)
#> ergm: version 3.11.0-6010, created on 2021-01-30
#> ergm.ego: version 0.6.0-569, created on 2021-01-30
library(ergm)
library(magrittr)

n <- 100
e <- 150
ds <- c(10,15,5,20)

y <- network.initialize(n, directed=FALSE)
y %v% "a" <- sample(1:3+6,n,replace=TRUE)
aM <- matrix(FALSE, 3, 3)
aM[1,1] <- aM[1,3] <- TRUE
y %v% "b" <- sample(letters[1:4],n,replace=TRUE)
y %v% "c" <- sample(runif(10),n,replace=TRUE)
y %v% "d" <- runif(n)
y <- san(y~edges+degree(0:3), target.stats=c(e,ds))
y.e <- as.egodata(y)

f <- ~ nodefactor(COLLAPSE_SMALLEST("b",2, "x")) + mm(a~(~b) %>% COLLAPSE_SMALLEST(2,"x"), levels2=TRUE)

f.y <- statnet.common::nonsimp_update.formula(f, y~.)
environment(f.y) <- globalenv()
f.y.e <- statnet.common::nonsimp_update.formula(f, y.e~.)
environment(f.y.e) <- globalenv()

(f.y.s <- summary(f.y))
#> nodefactor.b.d nodefactor.b.x    mm[a=7,b=a]    mm[a=8,b=a]    mm[a=9,b=a] 
#>             67            163             20             25             25 
#>    mm[a=7,b=d]    mm[a=8,b=d]    mm[a=9,b=d]    mm[a=7,b=x]    mm[a=8,b=x] 
#>             21             24             22             48             68 
#>    mm[a=9,b=x] 
#>             47
(f.y.e.s <- summary(f.y.e))
#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.

#> Warning: In unknown function: 'COLLAPSE_SMALLEST()' may behave unpredictably
#> with egocentric data and is not recommended at this time.
#> nodefactor.b.b nodefactor.b.c nodefactor.b.d nodefactor.b.x    mm[a=7,b=a] 
#>           44.0           37.5           33.5          150.0           10.0 
#>    mm[a=8,b=a]    mm[a=9,b=a]    mm[a=7,b=b]    mm[a=8,b=b]    mm[a=9,b=b] 
#>           12.5           12.5           12.0           19.5           12.5 
#>    mm[a=7,b=c]    mm[a=8,b=c]    mm[a=9,b=c]    mm[a=7,b=d]    mm[a=8,b=d] 
#>           12.0           14.5           11.0           10.5           12.0 
#>    mm[a=9,b=d]    mm[a=7,b=x]    mm[a=8,b=x]    mm[a=9,b=x] 
#>           11.0           44.5           58.5           47.0
stopifnot(all.equal(f.y.s,f.y.e.s))
#> Error: f.y.s and f.y.e.s are not equal:
#>   Names: 11 string mismatches
#>   Numeric: lengths (11, 19) differ

^{Created on 2021-01-30 by the reprex package (v1.0.0)}

This happens because among the egos, factor "b" has one set of most frequent levels, whereas for the alters, it's another, and they get pooled.

statnet / ergm.ego

Figure out how to get COLLAPSE_SMALLEST() to work consistently for egocentric data. #57