statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

ERGM estimated with version 3.8.0 no longer converges with 4.0.1 #346

Open leifeld opened 3 years ago

leifeld commented 3 years ago

Hi there. A few years ago I estimated an ERGM with version 3.8.0. I am now trying to estimate it again with version 4.0.1, but the model no longer converges. I wonder if you might be willing to take a look at this problem.

The analysis was published in this paper:

Leifeld, Philip (2018): Polarization in the Social Sciences: Assortative Mixing in Social Science Collaboration Networks is Resilient to Interventions. Physica A: Statistical Mechanics and its Applications 507: 510-523.

In the paper, I fit two identically specified ERGMs to the complete political science co-authorship networks in Switzerland (< 200 nodes) and in Germany (1,322 nodes). The Swiss model converges in no time. For the German case, I increase the MCMC interval and sample size to 10,000 each because of the larger size. With version 3.8.0, it takes about 20-30 minutes to converge. With version 4.0.1, it runs the whole night and occasionally throws error messages I have never seen before.

I will attach a minimal replication archive, which contains the data and code to run the ERGM and the .Rout log for ergm 3.8.0. I am currently running it with version 4.0.1 again as well and will post the .Rout file for that as well here once completed tomorrow. I am not sure yet if I will get those errors again, but even if not, the much longer estimation time seems weird already (it's been running for hours already).

To install ergm 3.8.0 in a local directory to re-run the old analysis, I did the following (with intermediate R restart as prompted):

renv::init(bare = TRUE)  # create a new library path in current working directory
install.packages("devtools")
library("devtools")  # use devtools to install prior ergm version
install_version("ergm", version = "3.8.0", repos = "http://cran.us.r-project.org")
renv::snapshot()

Thanks.

chad-klumb commented 3 years ago

May be due to changes in default MCMC behavior. Adding the control arguments

MCMLE.termination = "Hummel"

and

MCMLE.effectiveSize = NULL

in the second ergm call to obtain behavior more like 3.8 produces a fit for me in about 30 minutes.

leifeld commented 3 years ago

Thanks a lot! This solves it for me. I would have never been able to figure out these settings on my own, especially the MCMLE.effectiveSize part!

Here is the output I promised yesterday (ergm 4.0.1 without your adjustments) for completeness: Rout 4.0.1.zip. The error I encountered the first time did not show up again, but it took 12 hours to complete. Is this the expectation with larger networks? Are you going to document somewhere when to use which criterion?

chad-klumb commented 3 years ago

I think the hope was that the new defaults would be at least as reasonable as the old ones (across the board), but we've now seen several cases where the new defaults slow down or prevent convergence for models that used to work well.

I believe @krivit implemented both the adaptive MCMC algorithm and the confidence termination criterion, so he could probably speak to specifics better than I could.

krivit commented 3 years ago

Thanks for the report. It looks like https://github.com/statnet/ergm/issues/303 has reared its ugly head. I've prototyped some code to see if some other kind of detection works better.

krivit commented 3 years ago

@leifeld , the new version works for me.

krivit commented 3 years ago

Can you confirm?

leifeld commented 3 years ago

Thanks, @krivit. This is faster but still seems to be taking several hours. I can report the duration later once it has finished running.

leifeld commented 3 years ago

Actually it doesn't look like it's going any faster. I installed ergm 4.0-6598 (2021-07-13) and re-ran the script I submitted above, but it's been running for six hours already (now doing iteration 8).

krivit commented 3 years ago

It converged for me eventually, though I had left it running overnight, so I can't tell you when. The adaptive algorithms are faster most of the time, but sometimes convergence detection can be a bit too strict.

I keep meaning to put together a benchmark "gallery" to make sure fitting times don't regress too much, but I haven't had the time.

CarterButts commented 3 years ago

A reasonable starting point for test cases would be the faux* data sets. E.g.,

library(ergm)
data("faux.mesa.high")
fit <- ergm(faux.mesa.high~ edges + nodefactor("Grade") + nodefactor("Race") +
     nodefactor("Sex") + nodematch("Grade",diff=TRUE) +
     nodematch("Race",diff=TRUE) + nodematch("Sex",diff=FALSE) +
     gwdegree(1.0,fixed=TRUE) + gwesp(1.0,fixed=TRUE) + gwdsp(1.0,fixed=TRUE))

This was the original data generating model, and IIRC historically fit the data fairly readily. (Certainly, simplifications of it did.) On 4.0.1, it seems to be stuck on the first iteration (presumably it would finish eventually, but it takes longer than tolerance). I will note that if I fit this using SA, though, it does a lousy job (so it is possible that the model is not a good one). Dropping gwdsp and fitting with SA works OK so long as the MCMC.interval is reasonably large (better if we let the decay parameters float), but does not converge with G-T. (It fails immediately if the decay parameters float, and fails with MCMLE estimation stuck if both are set to 1.)

Another good example would be

data("faux.magnolia.high")
fit <- ergm(magnolia~edges+gwesp(0.25,fixed=T)+nodematch('Grade')+
              nodematch('Race')+nodematch('Sex'))

which is one of our tutorial classics. This one does fit for me, though it is quite slow.

leifeld commented 3 years ago

It converged for me eventually, though I had left it running overnight, so I can't tell you when. The adaptive algorithms are faster most of the time, but sometimes convergence detection can be a bit too strict.

I keep meaning to put together a benchmark "gallery" to make sure fitting times don't regress too much, but I haven't had the time.

It took 19 hours instead of previously 12 hours this time, so it actually got worse instead of better. But I am not sure if that's stochastic or caused by the changes you made. For completeness, here is the output: physa.Rout.zip

Feel free to use my example (script and data) in any sort of gallery or for other purposes. But it's probably a better idea to debug with a smaller dataset, like the one Carter suggested.

martinamorris commented 3 years ago

Hey @leifeld, good to hear from you.

As you found, there have been some major updates with ergm 4.0. We're a bit behind on documentation, but yes, the plan is to have a vignette on the controls.

It's good to have some user feedback (esp. from folks like you that push the research use case beyond the trivial) to hear how the current version is performing. That may lead us to change some of the defaults.