calculation errors of the `dgwesp` term

phuang5 commented 2 years ago

I notice that the calculation of dgwesp term is lower than what it should be for both "OTP" (outgoing two-paths) and "ITP" (incoming two-paths). Interestingly, the gwesp term, when applied to digraph, calculates the dgwesp(OTP) correctly. The problem only occurs when the network size is large enough, and it always under-counts the statistics in my experiment, so I suspect there might be some overflow issues that only happen to the dgwesp implementation.

Please see below a toy example that demonstrates the problem. Basically the script obtains the census of desp (edgewise shared-partners), and calculates the dgwesp statistics by hand to compare it with results from dgwesp and gwesp in ergm. Thanks!

library(network)
library(sna)
library(ergm)

#generate a random network----
nw <- network(rgraph(100))
as.sociomatrix(nw)

#calculate directed edgewise shared partners using esp census-----
#For OTP: outgoing two-paths
desp.otp <- summary(nw~desp(0:(network.size(nw)-2), type="OTP"))
desp.otp

#a gw function
gwesp <- function(alpha, i){
  exp(alpha)*(1-(1-exp(-alpha))^i)
}

#calculate dgwesp (OTP) from desp, decay=0.75
x <- 0
for (i in 1:length(desp.otp)) {
  x <- x+gwesp(alpha=0.75,i=i-1)*desp.otp[i] #i-1: start from 0
}
x

#calculate dgwesp (OTP) using ergm's dgwesp and gwesp------
summary(nw~dgwesp(0.75, fixed=T, type="OTP")+gwesp(0.75, fixed=T))

#replicate this for ITP: incoming two-paths-----
desp.itp <- summary(nw~desp(0:(network.size(nw)-2), type="ITP"))
desp.itp
y <- 0
for (i in 1:length(desp.itp)) {
  y <- y+gwesp(alpha=0.75,i=i-1)*desp.itp[i] #i-1: start from 0
}
y
summary(nw~dgwesp(0.75, fixed=T, type="ITP"))

krivit commented 2 years ago

Thanks for the detailed report! The problem is that whereas the gwesp() implementation computes the quantity of interest directly, the dgwesp implementation first computes the desp() frequencies, then takes their weighted sum. Since most problems in practice involve sparse networks, the SP quantities for which it keeps track is capped by gw.cutoff term option (see options?ergm or help for any of these terms). This defaults to 30, and if a network distribution has higher shared partner counts than that, it will result in incorrect values.

You can set this cutoff globally. For example running options(ergm.term=list(gw.cutoff=100)) before running all of your code will result in matching answers.

krivit commented 1 year ago

All variants of gw*sp(fix=TRUE) now use the full SP distribution regardless of cutoff. dg*sp(fix=FALSE) and gwb*sp(fix=FALSE) will immediately stop with an error if they ever encounter any configurations past the cutoff.

statnet / ergm

calculation errors of the `dgwesp` term #484