statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
96 stars 37 forks source link

Problem with MCMC% calculation in a curved ERGM example? #221

Open drh20drh20 opened 3 years ago

drh20drh20 commented 3 years ago

Runs of the same code using different random seed values produce MCMC% of zero despite variability suggesting MCMC error is larger than reported relative to the Std. Error estimates. Problem or not?

data(sampson)
summary(ergm(samplike~edges+gwesp(0.25, fix=TRUE), control=control.ergm(seed=12345)))
## ...
## Monte Carlo Maximum Likelihood Results:
## 
##                  Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges             -1.6340     0.4626      0  -3.532 0.000412 ***
## gwesp.fixed.0.25   0.4084     0.2685      0   1.521 0.128312    
## 

summary(ergm(samplike~edges+gwesp(0.25, fix=TRUE), control=control.ergm(seed=23456)))
## ...
## Monte Carlo Maximum Likelihood Results:
## 
##                  Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges             -1.5891     0.4502      0   -3.53 0.000415 ***
## gwesp.fixed.0.25   0.3648     0.2516      0    1.45 0.147123   
martinamorris commented 3 years ago

Indeed, this is the problem I remember from a couple years back. From your multi-seed simulations, it seems clear that there should be a value > 0 in this case...

krivit commented 3 years ago

Here's what I get under dev. (master is taking too long; I may post the benchmarks later, when/if it finishes.)

library(parallel)
library(magrittr)
cl <- makeCluster(8)
coef_vcov<- parSapply(cl, seq_len(1024), function(i){
  set.seed(i)
  library(ergm)
  data(sampson)
  fit <- suppressMessages(ergm(samplike~edges+gwesp(0.25, fix=TRUE), eval.loglik=FALSE))
  list(coef=coef(fit), vcov=vcov(fit, source="estimation"))
})
stopCluster(cl)

# Simulated ("true") variance
(cov(do.call(rbind,coef_vcov["coef",])) -> vcov.sim)
#>                          edges gwesp.fixed.0.25
#> edges             0.0009917137    -0.0005323630
#> gwesp.fixed.0.25 -0.0005323630     0.0003160315
# Mean calculated variance
(Reduce(`+`,coef_vcov["vcov",])/ncol(coef_vcov) -> vcov.mean)
#>                          edges gwesp.fixed.0.25
#> edges             0.0009652453    -0.0005232738
#> gwesp.fixed.0.25 -0.0005232738     0.0003058615
# Median calculated variance
(do.call(abind::abind, list(coef_vcov["vcov",], along=3)) %>% apply(1:2, median) -> vcov.median)
#>                          edges gwesp.fixed.0.25
#> edges             0.0009677611    -0.0005245718
#> gwesp.fixed.0.25 -0.0005245718     0.0003079973
# Proportion of calculated variance underestimating the true variance
(Reduce(`+`,lapply(coef_vcov["vcov",], `<`, vcov.sim))/ncol(coef_vcov) -> vcov.punder)
#>                      edges gwesp.fixed.0.25
#> edges            0.5371094        0.4804688
#> gwesp.fixed.0.25 0.4804688        0.5361328

# Ratios and sqrt(Ratios)
# Mean vs. True
vcov.mean/vcov.sim 
#>                      edges gwesp.fixed.0.25
#> edges            0.9733104        0.9829267
#> gwesp.fixed.0.25 0.9829267        0.9678196
sqrt(vcov.mean/vcov.sim)
#>                      edges gwesp.fixed.0.25
#> edges            0.9865649        0.9914266
#> gwesp.fixed.0.25 0.9914266        0.9837782
# Median vs. True
vcov.median/vcov.sim 
#>                      edges gwesp.fixed.0.25
#> edges            0.9758472        0.9853648
#> gwesp.fixed.0.25 0.9853648        0.9745777
sqrt(vcov.median/vcov.sim)
#>                      edges gwesp.fixed.0.25
#> edges            0.9878498        0.9926554
#> gwesp.fixed.0.25 0.9926554        0.9872070

Created on 2021-02-05 by the reprex package (v1.0.0)

Looks like there is some underestimation, but it's negligible.

I think the reason the MCMC% is so small is that it's relative to the standard error, and the standard error is pretty big.

krivit commented 3 years ago

@martinamorris , regarding the MCMC%, what they represent (at the moment) is (SE.Total - SE.Model)/SE.Total. One way of looking at is "If our estimation were exact, what would be the percent decrease in the standard error reported in the table?"

There are other quantities possible. E.g.,

suppressMessages(library(ergm))
data(sampson)
fit <- suppressMessages(ergm(samplike~edges+gwesp(0.25, fix=TRUE), eval.loglik=FALSE))
summary(fit)
#> Call:
#> ergm(formula = samplike ~ edges + gwesp(0.25, fix = TRUE), eval.loglik = FALSE)
#> 
#> Monte Carlo Maximum Likelihood Results:
#> 
#>                  Estimate Std. Error MCMC % z value Pr(>|z|)    
#> edges             -1.6165     0.4246      0  -3.807 0.000141 ***
#> gwesp.fixed.0.25   0.4042     0.2433      0   1.661 0.096666 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood was not estimated for this fit. To get deviances, AIC, and/or BIC, use ‘*fit* <-logLik(*fit*, add=TRUE)’ to add it to the object or rerun this function with eval.loglik=TRUE.

v.mod <- diag(vcov(fit, sources="mod")) # Model variance
v.est <- diag(vcov(fit, sources="est")) # Estimation variance
v.tot <- diag(vcov(fit, sources="all")) # Total variance

# What fraction of the total variance is the estimation variance?
v.est/v.tot
#>            edges gwesp.fixed.0.25 
#>      0.002816901      0.002816901

# What fraction of the reported standard error is the estimation standard error?
sqrt(v.est)/sqrt(v.tot)
#>            edges gwesp.fixed.0.25 
#>       0.05307449       0.05307449

# How much bigger is the standard error than it could have been had we had exact MLE?
sqrt(v.tot)/sqrt(v.mod)
#>            edges gwesp.fixed.0.25 
#>         1.001411         1.001411

# If we had exact MLE, by what percent would standard error reported in the table decrease?
(sqrt(v.tot)-sqrt(v.mod))/sqrt(v.tot)
#>            edges gwesp.fixed.0.25 
#>      0.001409444      0.001409444

Created on 2021-02-05 by the reprex package (v1.0.0)

krivit commented 3 years ago

@drh20drh20 , @martinamorris , do we want to change anything here, or is the status quo satisfactory?