statnet / tergmLite

A fast, simplified version of TERGM-based simulation for epidemic modeling
7 stars 4 forks source link

Interim decision: adapting tergmLite to new ergm-term API #24

Closed martinamorris closed 5 years ago

martinamorris commented 5 years ago

This is related to #23

The recent ergm-terms API change led to an effort to establish a consistent set of ergm-term calculations -- so that ergm.ego and tergmLite would no longer have a separate set of terms.

Chad identified a way to change tergmLite (and add a couple of generics to the network package) so that it could use the standard ergm-term set. He committed changes to the "tergmLiteterms" branch of the repo. Steve has taken these for a trial run and did not find any problems.

The question now is should these changes be pulled into tergmLite?

The only reason this is a question is because in the process Pavel Chad and I also recognized that we would like to generalize the tergmLite idea, first by implementing a similar stripped down networkLite object, and then by allowing for a range of network data formats to be read and processed by ergm-terms. And that may mean the changes Chad implemented now will be modified again. Of course this will take a bit of time (but probably months, not years) as Chad has a list of things he's working on now.

@chad-klumb please add or correct this summary as needed.

My suggestion: either integrate Chad's changes into master now, or create a dev branch of tergmLite that will serve as an interim working version while the rest of this gets worked out. @smjenness this is ultimately your call.

smjenness commented 5 years ago

Hi @martinamorris : thanks for this update. I have tested the changes that @chad-klumb made and added some comments on #23 for next steps. At this point, I'm fine with the master branch for tergmLite be the current (i.e., older, using ergm 3.9) approach, while updates are made on developmental tergmLiteterms branch.

However, for our MIDAS work the top priority will be to get this current redesign of the tergmLite, network, and EpiModel packages working for both the toy models (e.g., NME materials) and research-level models (e.g., the test script for EpiModelHIV that I added just now) before we undertake a broader expansion of the networkLite idea.

martinamorris commented 5 years ago

However, for our MIDAS work the top priority will be to get this current redesign of the tergmLite, network, and EpiModel packages working for both the toy models (e.g., NME materials) and research-level models (e.g., the test script for EpiModelHIV that I added just now) before we undertake a broader expansion of the networkLite idea.

Of course. On other fronts Chad has

smjenness commented 5 years ago

@martinamorris wait, the estimation time for the 3-network MSM models with the current ergm is taking 7 hours?!? It takes me ~1 minute for all three models (main, casl, one-time).

martinamorris commented 5 years ago

@smjenness i'd love to see that. what versions of ergm/tergm are you using? these models have never taken ~1 min for us. do you have some code that takes this long for you that we can test against current versions?

to be complete, tho, the 40 min includes a summary (to get the suffstats from ergm.ego) and mcmc.diagnostics for each model.

smjenness commented 5 years ago

For example, this model: https://github.com/EpiModel/CombPrev/blob/master/estimation/estimation.R

Using these package versions (these are the pre-"levels API" versions of ergm/tergm): https://github.com/EpiModel/CombPrev/blob/master/InstallStack.R

Takes about a minute to run.

chad-klumb commented 5 years ago

I installed the commits referred to above, and ran the main model in that script on nori. It spent ~ 20 minutes in the MPLE, another ~ 15 minutes in MCMC, and failed to converge within the allotted 500 MCMC iterations.

As a point of reference, the 50k ppop cohab model took ~ 10 minutes in the MPLE with dev. The network in the script above has 100k nodes, so I'm not surprised to see > 10 minutes in the MPLE. (Granted, the ergm versions and models are different.)

To the best of my knowledge the networkLite changes should have had absolutely zero effect on estimation (they should actually not even be referred to by the estimation code), so if there is a big change in estimation time, I don't think networkLite is the underlying cause.

Here is the sessionInfo (ARTnet would not install from Github, even with an auth token, so I downloaded the relevant commit and installed it from the command line, that's why it says local instead of the commit ID)

> sessioninfo::package_info(pkgs = c("network", "networkDynamic", "statnet.common",
+                                    "ergm", "tergm", "EpiModel", "EpiModelHPC",
+                                    "tergmLite", "EpiABC", "EpiModelHIV",
+                                    "ARTnetData", "ARTnet"), dependencies = FALSE)
 package        * version     date       lib source                                 
 ARTnet           1.0.0       2019-09-04 [1] local                                  
 ARTnetData       1.0         2019-09-04 [1] Github (EpiModel/ARTnetData@1d8ec6e)   
 EpiABC           1.0         2019-09-04 [1] Github (EpiModel/EpiABC@c32ecb6)       
 EpiModel         1.7.3       2019-09-04 [1] Github (statnet/EpiModel@2c131f0)      
 EpiModelHIV      1.5.0       2019-09-04 [1] Github (EpiModel/EpiModelHIV-p@554186b)
 EpiModelHPC      2.0.2       2019-09-04 [1] Github (statnet/EpiModelHPC@a64dbf2)   
 ergm             3.10.0-4851 2019-09-04 [1] Github (statnet/ergm@8b30e92)          
 network          1.14-377    2019-09-04 [1] Github (statnet/network@deff2a0)       
 networkDynamic   0.10        2019-09-04 [1] Github (statnet/networkDynamic@14182bf)
 statnet.common   4.3.0-230   2019-09-04 [1] Github (statnet/statnet.common@3307a8c)
 tergm            3.6.0-1659  2019-09-04 [1] Github (statnet/tergm@d3af135)         
 tergmLite        1.2.0       2019-09-04 [1] Github (statnet/tergmLite@73d2a2d)     

[1] /homes/cklumb/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/lib/R/site-library
[3] /usr/lib/R/library

here are the target stats:

> cbind(netstats_main)
                      netstats_main
edges                    19806.7633
nodematch_age.grp1        2986.2927
nodematch_age.grp2        3266.4111
nodematch_age.grp3        2568.2408
nodematch_age.grp4        1680.1638
nodematch_age.grp5         919.4133
nodefactor_age.grp1       9363.6128
nodefactor_age.grp2       8896.0324
nodefactor_age.grp3       7514.7929
nodefactor_age.grp4       5678.8833
nodematch_race           15156.4797
nodefactor_race1          1944.9159
nodefactor_race2         18082.2534
nodefactor_deg.casl1      7820.4605
nodefactor_deg.casl2      3259.0264
nodefactor_deg.casl3       650.7738
concurrent                 966.1836
degrange                     0.0000
nodematch_role.class1        0.0000
nodematch_role.class2        0.0000

here is some of the output from netest:

> # Fit model
> fit_main <- netest(nw_main,
+                    formation = model_main,
+                    target.stats = netstats_main,
+                    coef.diss = netstats$main$diss.byage,
+                    set.control.ergm = control.ergm(MCMLE.maxit = 500,
+                                                    SAN.maxit = 3,
+                                                    SAN.nsteps.times = 3),
+                    verbose = FALSE)
Observed statistic(s) deg3+, nodematch.role.class.0, and nodematch.role.class.1 are at their smallest attainable values. Their coefficients will be fixed at -Inf.
Unable to match target stats. Using MCMLE estimation.
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
...
(many MCMC iterations)
...
Iteration 500 of at most 500:
Optimizing with step length 0.346667219096277.
The log-likelihood improved by 1.697.
MCMLE estimation did not converge after 500 iterations. The estimated coefficients may not be accurate. Estimation may be resumed by passing the coefficients as initial values; see 'init' under ?control.ergm for details.
Finished MCMLE.
This model was fit using MCMC.  To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

and here is the summary of the fit:

> summary(fit_main)

==========================
Summary of model fit
==========================

Formula:   TARGET_STATS ~ edges + nodematch("age.grp", diff = TRUE) + nodefactor("age.grp", 
    base = 1) + nodematch("race", diff = FALSE) + nodefactor("race", 
    base = 1) + nodefactor("deg.casl", base = 1) + concurrent + 
    degrange(from = 3) + nodematch("role.class", diff = TRUE, 
    keep = 1:2)
<environment: 0x5585a2a59570>

Iterations:  500 out of 500 

Monte Carlo MLE Results:
                        Estimate Std. Error MCMC %  z value Pr(>|z|)    
edges                  -13.92193    0.07900      1 -176.219   <1e-04 ***
nodematch.age.grp.1      3.02365    0.06990      2   43.259   <1e-04 ***
nodematch.age.grp.2      2.50429    0.06588      2   38.010   <1e-04 ***
nodematch.age.grp.3      1.57949    0.09088      2   17.381   <1e-04 ***
nodematch.age.grp.4      0.78264    0.08419      2    9.296   <1e-04 ***
nodematch.age.grp.5      0.42833    0.07429      2    5.765   <1e-04 ***
nodefactor.age.grp.2     0.40064    0.05039      2    7.951   <1e-04 ***
nodefactor.age.grp.3     0.74938    0.04669      2   16.049   <1e-04 ***
nodefactor.age.grp.4     0.79595    0.05356      1   14.860   <1e-04 ***
nodefactor.age.grp.5     0.53950    0.04904      2   11.001   <1e-04 ***
nodematch.race           1.51278    0.03066      2   49.342   <1e-04 ***
nodefactor.race.2        0.81763    0.06166      1   13.259   <1e-04 ***
nodefactor.race.3        0.15544    0.01653      2    9.404   <1e-04 ***
nodefactor.deg.casl.1   -0.34729    0.02627      2  -13.218   <1e-04 ***
nodefactor.deg.casl.2   -0.65870    0.03911      1  -16.842   <1e-04 ***
nodefactor.deg.casl.3   -1.03679    0.07519      1  -13.790   <1e-04 ***
concurrent              -2.62436    0.04873      1  -53.853   <1e-04 ***
deg3+                       -Inf    0.00000      0     -Inf   <1e-04 ***
nodematch.role.class.0      -Inf    0.00000      0     -Inf   <1e-04 ***
nodematch.role.class.1      -Inf    0.00000      0     -Inf   <1e-04 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Log-likelihood was not estimated for this fit.
To get deviances, AIC, and/or BIC from fit `object$fit` run 
  > object$fit<-logLik(object$fit, add=TRUE)
to add it to the object or rerun this function with eval.loglik=TRUE.

 Warning: The following terms have infinite coefficient estimates:
  deg3+ nodematch.role.class.0 nodematch.role.class.1 

Dissolution Coefficients
=======================
Dissolution Model: ~offset(edges) + offset(nodematch("age.grp", diff = TRUE))
Target Statistics: 219.8742 71.89133 251.3025 530.0986 640.4099 907.8107
Crude Coefficient: 5.388497 -1.127349 0.1341729 0.8826775 1.072048 1.421437
Mortality/Exit Rate: 0.000478213
Adjusted Coefficient: 5.624527 -1.29216 0.1729676 1.353648 1.783692 3.21071
martinamorris commented 5 years ago
  1. Have you tried running the model above swapping out to use the dev branch of ergm?

  2. When you say:

    To the best of my knowledge the networkLite changes should have had absolutely zero effect on estimation (they should actually not even be referred to by the estimation code), so if there is a big change in estimation time, I don't think networkLite is the underlying cause.

did you mean tergmlite?

martinamorris commented 5 years ago

@chad-klumb Also, I think we can close this issue, and commit the changes you made to a new dev branch of tergmlite. This issue was just meant to address the question of what to do with your tergmlite modifications in the interim. @smjenness made that decision.

Going forward, discussion of the performance of the new dev branch should be started in a one or more new issues, with appropriate titles.

chad-klumb commented 5 years ago

So it sounds like the present location (tergmLiteTerms branch) is fine and we should try to get it merged into the main branch as soon as we are all satisfied it's performing appropriately?

martinamorris commented 5 years ago

So it sounds like the present location (tergmLiteTerms branch) is fine and we should try to get it merged into the main branch as soon as we are all satisfied it's performing appropriately?

@smjenness would you like to start a dev branch for tergmlite with this tergmLiteTerms branch while we test performance, and use that dev branch going forward for prototyping updates? or are you satisfied with Chad's test https://github.com/statnet/tergmLite/issues/25#issuecomment-528155493 that the modifications can be pulled into master?

either way, it may be good to start a dev branch for prototyping changes.

smjenness commented 5 years ago

Now that I have performance tested Chad's work (and everything looks great), I would like to suggest the following for tergmLite branching:

@martinamorris: I think Step 1 (merging network branches) will be key for code clarity and organization, but I understand if this will take some time to work through with Carter.

chad-klumb commented 5 years ago

Based on the last attempted updates to network (PR statnet/network#12), we will need to run revdeps for these changes to make sure we don't break anyone else's packages.

The holdup with statnet/network#12 is that there were a few issues with running revdeps on statnet2:

For this (and other changes going forward), we will need an environment where

We will also need some protocols for what to do when a revdep cannot be tested, which could be due to several reasons:

There's also a question of how hard we should work to test someone else's package if it doesn't just work for the current R version (since the onus is kind of on them to keep it up-to-date). Do we inform them we're updating our package and couldn't test theirs because it's out of date? Or do we spend lots of time trying to get the tests to run with an older version of R?

martinamorris commented 5 years ago

thx Chad for the throrough discussion. it looks like Rsiena "suggests" network -- less important that a true dependency. But the folks behind this package are friends, so we should probably reach out via personal channels.

@krivit @CarterButts we should discuss this point next week in Zurich:

There's also a question of how hard we should work to test someone else's package if it doesn't just work for the current R version (since the onus is kind of on them to keep it up-to-date). Do we inform them we're updating our package and couldn't test theirs because it's out of date? Or do we spend lots of time trying to get the tests to run with an older version of R?

smjenness commented 5 years ago

Hi @chad-klumb and @martinamorris : I would like to check in on the status of the checklist above.

https://github.com/statnet/tergmLite/issues/24#issuecomment-528483629

It looks like we are still stuck on the first item.

martinamorris commented 5 years ago

@chad-klumb you can go ahead and update network (I've changed your permissions).

Before we release this next version of network to CRAN I need to contact the folks at RSienna. I did chat with Pavel and Carter briefly at EUSN last month. We agreed it's important to keep the RSienna folks in the loop, but we can't hijack our development process for ensuring backwards compatibility with RSienna.

I'll write to Tom S and Christian S today. That should give them plenty of time before we release network to CRAN. Sorry for the delay @smjenness

chad-klumb commented 5 years ago

Thanks for the permissions; I'll get the changes up shortly. (We should probably chat briefly about this at today's NMG meeting.)

martinamorris commented 5 years ago

Update: @chad-klumb will issue a pull request to network with a 24 hr deadline.

it looks like the blocking issue (revdeps) might be related to the statnet server we test things on, rather than RSienna itself. the server is old and outdated, and will be retired in December, so we'll be talking with @krivit about setting up a new test environment.

chad-klumb commented 5 years ago

Working on getting a direct comparison of revdeps for current network master and the proposed changes. (Even if we can't test every last one it would be good to see that there are no new failures among those that we can test, which is most of them.) I was hoping to have that done today but the process hung so I won't have it until tomorrow (at which point I'll make the PR). Sorry for the delay.

smjenness commented 5 years ago

Hi @chad-klumb : I completed my checkboxes here (https://github.com/statnet/tergmLite/issues/24#issuecomment-528483629) and deleted the old development branch in the process. I'm closing this issue now as I assume you will create a new dev branch off of current master when you need to continue working here. Thanks for all your work here. I also added you to the author list in the DESCRIPTION file.