statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
98 stars 37 forks source link

ergm fits with low concurrent target.stats fail #52

Closed smjenness closed 5 years ago

smjenness commented 5 years ago

Using both the CRAN version and dev version of ergm, this fit:

nw <- network.initialize(100, directed = FALSE)
fit <- ergm(nw ~ edges + concurrent,
            target.stats = c(50, 0))

errors after 9 iterations with:

Iteration 9 of at most 20:
Error in rbind(if (!is.null(x2)) t(gamma * t(x2crs) + (1 - gamma) * c(m1crs)),  : 
  object 'm2crs' not found

Also, a fit with a low but non-zero target stat for concurrent does not mix:

fit <- ergm(nw ~ edges + concurrent,
            target.stats = c(50, 5))
...
The log-likelihood improved by 0.002221.
Iteration 18 of at most 20:
Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL),  : 
  Unconstrained MCMC sampling did not mix at all. Optimization cannot continue.

We have fit these models all the time in the past without error. They are a primary feature of our modeling classes (NME) to demonstrate the impact of concurrency on epidemic dynamics. This would be a critical issue to fix.

cc: @martinamorris, @sgoodreau

krivit commented 5 years ago

I've fixed the error, in the first case, but I sometimes get premature convergence. For some reason, -Inf doesn't prohibit concurrent ties completely. I'll investigate.

krivit commented 5 years ago

I think that what's happening is that [50,0] is in a corner of the convex hull---having the most edges possible while still having no concurrency---and while ergm can detect that for concurrency, it can't detect it for edge count, so it needs to estimate it, but it doesn't take much to get a step length of 1 twice in a row, so the Hummel convergence criterion tells it to stop.

krivit commented 5 years ago

The first one. I am still getting to the bottom of that case.

krivit commented 5 years ago

That said, this limitation might be inherent in the Hummel stopping criterion.

krivit commented 5 years ago

The [50,5] case is an artefact of MPLE. Here's the degree distribution of a network that san() generates:

degree0 degree1 degree4 degree5 degree8 
     20      75       3       1       1 

Notice that there are no nodes with degree 2. This means that no single toggle can reduce concurrency on this network but some can increase it. This, in turn, leads the MPLE's logistic regression to be separable, and it returns an initial parameter estimate for -20 or so for concurrency. MCMLE is not able to recover from that.

krivit commented 5 years ago

In fact, I wonder if we end up with a bimodal degree distribution, which makes it very difficult to sample from.

martinamorris commented 5 years ago

The [50,5] case is an artefact of MPLE. Here's the degree distribution of a network that san() generates:

degree0 degree1 degree4 degree5 degree8 
     20      75       3       1       1 

Notice that there are no nodes with degree 2. This means that no single toggle can reduce concurrency on this network but some can increase it. This, in turn, leads the MPLE's logistic regression to be separable, and it returns an initial parameter estimate for -20 or so for concurrency. MCMLE is not able to recover from that.

Very interesting. It raises a separate issue from Sam's point, that the estimation doesn't behave the way it used to. And that point is still important. Stability, esp. for these really basic models, is something we need to strive for.

But the other thing this highlights is one of the odd properties of this specification. For this term, Deg(3) is a threshold, and once you're over it, the term no longer has any effect. While there is some social/psychological logic to this behavior -- once you've crossed over to the world of concurrency, the monogamy norms no longer constrain your behavior -- even that's not really consistent, as the norms still operate on Deg(2).

If we want a different, more logical behavior, I think we'll need to define a different term. But that belongs in a different issue...

krivit commented 5 years ago

Interestingly, even when I get a reasonable-looking starting value with CD, it still doesn't converge. (What it does do is bounce around never quite getting an error.

krivit commented 5 years ago

I spy with my little eye... a phase transition!

library(ergm) # either dev or master; maybe CRAN
nw <- network.initialize(100, directed = FALSE)
mleish <- c( -1.631266,  -5.502178 ) # This is as close as I've been able to get to what I had thought was the MLE.
set.seed(0)
sim <- simulate(nw~edges+concurrent, coef=mleish, nsim=100000, simplify=FALSE, output="stats")
plot(sim)

n100_edges50_concurrent5 phase_change At first, it's spot-on. A little low, in fact:

summary(window(sim, end=4.75e7))
             Mean    SD Naive SE Time-series SE
edges      49.843 6.526  0.03031         0.1051
concurrent  4.952 2.698  0.01253         0.0430

But then:

summary(window(sim, start=4.8e7))
             Mean       SD  Naive SE Time-series SE
edges      810.21 26.01260 0.1128415       0.205183
concurrent  99.99  0.08654 0.0003754       0.001857

I did not expect that from this model---since models with bounded change statistics tend to be well-behaved---and I still wonder if there is some other region in the parameter space that might behave better while having the correct expectation. It is also possible, in theory, that there is some deep bug. But, provisionally, at least, I am going to have to ask @smjenness to double-check whether this model and target combination ever fit well.

smjenness commented 5 years ago

Yes, perhaps this is not a great example in general to be exploring concurrency with the mean degree of 1. I ran a bunch of combinations with a mean degree of 0.8 (edges = 40) instead and there was no problem fitting these. I think we can close this issue, and we'll just make sure to update any tutorials we have that might use the n=100/edges=50 combo to demonstrate concurrency. Thanks @krivit!

handcock commented 5 years ago

In addition to Ian's notes about LOLOG models, it is worth noting this his "Tapered ERGM" models work on this example. They produce fits with a narrow, but credible variation. Given the extreme nature of (50,5) this does seem ok. Explicitly:

library(ergm.tapered)
nw <- network.initialize(100, directed = FALSE)
fit <- ergm.tapered(nw ~ edges + concurrent, target.stats=c(50,5),
  control=control.ergm(MCMC.interval=100000))
summary(fit)
mcmc.diagnostics(fit)
#
summary(fit$newnetwork ~ edges + concurrent)
#
sim <- simulate(fit, nsim=2000, simplify=FALSE, output="stats",
  control=control.simulate.ergm(MCMC.interval=20000, MCMC.burnin=100000))
plot(sim)
head(sim)
apply(as.matrix(sim),2,median)
apply(as.matrix(sim),2,mean)
apply(as.matrix(sim),2,var)

produces the attached output

condeg1.Rout.txt

Rplots.pdf

krivit commented 5 years ago

I think that this is the best we can do in ergm for the moment. I've added the non-phase-transitioning test case to statnet/ergm.bench, so that we can keep working it as a part of general computational improvements.