statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

Unexpected behavior of COLLAPSE_SMALLEST() #544

Closed benrosche closed 6 months ago

benrosche commented 6 months ago

Dear statnet team,

I hope this is a bug and not just me not understanding the behavior of COLLAPSE_SMALLEST(). I noticed that COLLAPSE_SMALLEST() does not always seem to collapse the next smallest group:

library(ergm)
library(dplyr)

data(faux.mesa.high)

all <- summary(faux.mesa.high ~ nodefactor("Race", levels=T)) %>% sort() 
# nodefactor.Race.Other nodefactor.Race.Black nodefactor.Race.White nodefactor.Race.NatAm  nodefactor.Race.Hisp 
# 1                     26                    45                    156                    178 

collapsed3 <- summary(faux.mesa.high ~ nodefactor((~Race) %>% COLLAPSE_SMALLEST(3, "group"), levels=T)) %>% sort()
# nodefactor.Race.group nodefactor.Race.NatAm  nodefactor.Race.Hisp 
# 72                    156                    178

collapsed4 <- summary(faux.mesa.high ~ nodefactor((~Race) %>% COLLAPSE_SMALLEST(4, "group"), levels=T)) %>% sort()
# nodefactor.Race.NatAm nodefactor.Race.group 
# 156                   250 

# The next smallest group is nodefactor.Race.NatAm but instead nodefactor.Race.Hisp was subsumed.
# NatAm is smaller than Hisp both in terms of summary statistic (nodefactor.Race) and node attribute (Race):

faux.mesa.high %v% "Race" %>% table() %>% sort()

Best wishes, Ben

krivit commented 6 months ago

I think it's a bug. Thanks for flagging it! I know how to fix it, but I want to fix another issue while I'm at it.