leifeld / btergm

Temporal Exponential Random Graph Models by Bootstrapped Pseudolikelihood
16 stars 10 forks source link

error while running mtergm #30

Closed smy310 closed 2 years ago

smy310 commented 2 years ago

I have a network with 126 nodes, 252 edges and 23 time steps.

I tried to run MCMC with mtergm in Windows (32g) and in Linux (more than 100g according to Uni support, it is a cloud server that we can use), both showed the same error as below:

Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...

caught segfault address 0x2b9549685000, cause 'memory not mapped'

Traceback: 1: (function (lprec) { .Call(RlpSolve_delete_lp, lprec) invisible(lprec)})(<pointer: 0x15d59e50>) An irrecoverable exception occurred. R is aborting now ...

I tried to re-install the R and R studio, but still did not work. I also tried some methods listed here: https://stackoverflow.com/questions/49190251/caught-segfault-memory-not-mapped-error-in-r, but not did not help.

May I know how can I solve this problem?

Thanks in advance!

leifeld commented 2 years ago

It sounds like it might be caused by a bug in the ergm package, specifically its C code. mtergm uses this code for MCMC-MLE.

Does the code in the paper about the btergm package in the Journal or Statistical Software work? Do the examples on the R help page for mtergm work? Does ERGM estimation with the ergm package work on your computer?

If so, one thing you could try is to use an argument in the mtergm function to return the data to you and then see if you can feed the data directly into the ergm function. Happy to provide more help if needed.

smy310 commented 2 years ago

Thanks for your reply. I have tried the codes of example (friendship data) with mtergm and it works, both in Windows and Linux.

So it wouldn’t be a problem with the memory storage size or something about my system that I used? Not a problem with data size that I applied?

How should I do the following exactly? I only have to set MCMC sample size, MCMC interval and parallel in controll.ergm.

“If so, one thing you could try is to use an argument in the mtergm function to return the data to you and then see if you can feed the data directly into the ergm function.”

leifeld commented 2 years ago

You can return the data from the mtergm call by using the returndata argument, like this:

mtergm(..., returndata = TRUE)

The respective code in the mtergm function shows you that after the part where the data are returned the function would basically just take the different elements of the list of objects and run the ergm function on them, possibly using a slightly adapted version of the formula you provided if necessary. So you could return the data and run those steps by yourself to verify that this is actually a problem that happens when the ergm package is used. If so, perhaps it would be worth providing it as a minimal self-contained example to the ergm authors and get their input on how to fix it. If you struggle with adjusting the formula, you can also just run the tergmprepare function manually, which will give you the data and the formula, just like in the code above the line marked here.

The error message sounds like it could either be some memory problem in the ergm code or, more likely, some issue with parallel processing, as per the link you provided. It cannot originate in btergm itself because there is no compiled code in the package, and the error message indicates a problem with some compiled code. Maybe the supercomputer is using different computing nodes that have different versions of the ergm package or some other dependency of ergm or btergm? Or maybe some of the nodes built the package on a different version of R? It may be worth checking the version numbers on all cores. And/or consult the system administrator for advice on this.

That said, I am not entirely sure you actually need mtergm because it gets fairly complicated to estimate with so many time steps. As per the article about btergm in the Journal of Statistical Software and earlier work by Cranmer and Desmarais, bootstrapped MPLE should likely be effective with 23 time steps, so you should be able to use the btergm function for this.

There is also a newer estimation wrapper in the btergm package that wasn't covered in the JSS article, which is the tbergm function. It uses Bayesian estimation from the Bergm package. Maybe that could eliminate the problem if you are not keen to use the btergm function. Or it may be subject to the same problem if it is caused by an ergm (or dependency) installation issue as per the above.

smy310 commented 2 years ago

I have tested something and a short Feedback below.

„Maybe the supercomputer is using different computing nodes that have different versions of the ergm package or some other dependency of ergm or btergm? Or maybe some of the nodes built the package on a different version of R? It may be worth checking the version numbers on all cores. And/or consult the system administrator for advice on this.”

Feedback

We tested the friendship data with mtergm on Cloud Server und it worked while adding parallel in control.ergm.

System administrator has confirmed that he has installed R for the first time, when I asked him to do that a few days ago. So it might not be a problem with versions since there is only one version over there, according to him.

“There is also a newer estimation wrapper in the btergm package that wasn't covered in the JSS article, which is the tbergm function. It uses Bayesian estimation from the Bergm package. Maybe that could eliminate the problem if you are not keen to use the btergm function. Or it may be subject to the same problem if it is caused by an ergm (or dependency) installation issue as per the above.”

Feedback

I tried tbergm on my PC and it worked, where mtergm did not work weeks ago (the same error with memory at that time).

“So you could return the data and run those steps by yourself to verify that this is actually a problem that happens when the ergm package is used. If so, perhaps it would be worth providing it as a minimal self-contained example to the ergm authors and get their input on how to fix it.”

Feedback

I got the data, but could not run the obtained data correctly. Should I run mtergm with the obtained “data” and the revised form after tergmprepare? I a wondering I did not catch the meaning of i in offset(edgecov(offsmat[[i]])).

Thanks again for your kindly reply.

leifeld commented 2 years ago

OK, good to hear that you could rule out a server configuration problem. That means it is probably a problem in one of the dependencies, which cannot cope with your dataset for some reason. It's definitely not something that originates in the btergm package because the error message points to some compiled code, and there is none in the package. I just looked up the error message, and it seems to come from the lpSolveAPI package. This package is imported by the ergm package. So I think the best route towards solving the problem is to create a self-contained example that factors btergm out of the equation and take it to the ergm maintainers. Here is a script that can help you to do that:

library("texreg")
library("network")
library("ergm")
library("btergm")

# load an example dataset from the btergm package
data("knecht")

set.seed(12345) # set the random seed

# do some data management; see ?knecht
for (i in 1:length(friendship)) {
  rownames(friendship[[i]]) <- 1:nrow(friendship[[i]])
  colnames(friendship[[i]]) <- 1:ncol(friendship[[i]])
}
rownames(primary) <- rownames(friendship[[1]])
colnames(primary) <- colnames(friendship[[1]])
friendship <- handleMissings(friendship, na = 10, method = "remove")
friendship <- handleMissings(friendship, na = NA, method = "fillmode")

# set up some formula for ERGM estimation
f <- friendship ~ edges + mutual + transitiveties + ctriple +
  edgecov(primary) + delrecip + memory(type = "stability")

model <- mtergm(f) # use mtergm for estimation across four time steps

# now do the same thing manually by using tergmprepare and then ergm
l <- tergmprepare(f, offset = FALSE, blockdiag = TRUE)
for (i in 1:length(l$covnames)) {
  assign(l$covnames[i], l[[l$covnames[i]]])
}
assign("offsmat", l$offsmat)
form <- as.formula(l$form, env = environment())
model2 <- ergm(form, offset.coef = -Inf)

# check if the two models return similar results
screenreg(list(model, model2), single.row = TRUE)

The friendship object is a list of four networks. mtergm creates a single block-diagonal matrix and puts the remaining covariates into a conformable format. After the mtergm estimation, I do the same thing manually. First, I call tergmprepare to set up the block-diagonal data structures and the formula. Everything you need is stored in the list l. The loop goes through the covariates in the list and saves them in the global environment. Then we save the offset matrix, which denotes the off-diagonal blocks, in the global environment and then the formula object. The offset matrix is necessary because the ERGM needs to know which parts of the network (basically the intertemporal dyads saved in the off-diagonal blocks) are taboo for the sampler because they are impossible to form. You can see that we can then just use the ergm package for estimation, but we have to specify the offset matrix.

You could adapt the script to your own case and create the data structures in this way and then save the relevant objects into some data file. You could write a script that uses this data file to estimate an ERGM using the ergm function, without any btergm. To figure out which formula to use in the script, look at the form object.

It's great to hear that tbergm works well in your case. So perhaps you won't even need mtergm for this.

smy310 commented 2 years ago

The same error when using ergm.

l <- tergmprepare(form1,

  • offset = FALSE,
  • blockdiag = TRUE) Mean transformed timecov values: t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 t=10 t=11 t=12 t=13 t=14 t=15 t=16 t=17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 t=18 t=19 t=20 t=21 t=22 t=23 18 19 20 21 22 23

Initial dimensions of the network and covariates: t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 t=10 t=11 t=12 t=13 t=14 t=15 network (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 network (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_dist (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_dist (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_ling (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_ling (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 colonize (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 colonize (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 memory (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 memory (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 timecov1 (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 timecov1 (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 t=16 t=17 t=18 t=19 t=20 t=21 t=22 t=23 network (row) 126 126 126 126 126 126 126 126 network (col) 126 126 126 126 126 126 126 126 mat_dist (row) 126 126 126 126 126 126 126 126 mat_dist (col) 126 126 126 126 126 126 126 126 mat_ling (row) 126 126 126 126 126 126 126 126 mat_ling (col) 126 126 126 126 126 126 126 126 colonize (row) 126 126 126 126 126 126 126 126 colonize (col) 126 126 126 126 126 126 126 126 memory (row) 126 126 126 126 126 126 126 126 memory (col) 126 126 126 126 126 126 126 126 timecov1 (row) 126 126 126 126 126 126 126 126 timecov1 (col) 126 126 126 126 126 126 126 126

All networks are conformable.

Dimensions of the network and covariates after adjustment: t=2 t=3 t=4 t=5 t=6 t=7 t=8 t=9 t=10 t=11 t=12 t=13 t=14 t=15 network (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 network (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_dist (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_dist (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_ling (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 mat_ling (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 colonize (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 colonize (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 memory (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 memory (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 timecov1 (row) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 timecov1 (col) 126 126 126 126 126 126 126 126 126 126 126 126 126 126 t=16 t=17 t=18 t=19 t=20 t=21 t=22 t=23 network (row) 126 126 126 126 126 126 126 126 network (col) 126 126 126 126 126 126 126 126 mat_dist (row) 126 126 126 126 126 126 126 126 mat_dist (col) 126 126 126 126 126 126 126 126 mat_ling (row) 126 126 126 126 126 126 126 126 mat_ling (col) 126 126 126 126 126 126 126 126 colonize (row) 126 126 126 126 126 126 126 126 colonize (col) 126 126 126 126 126 126 126 126 memory (row) 126 126 126 126 126 126 126 126 memory (col) 126 126 126 126 126 126 126 126 timecov1 (row) 126 126 126 126 126 126 126 126 timecov1 (col) 126 126 126 126 126 126 126 126

for (i in 1:length(l$covnames)) {

  • assign(l$covnames[i], l[[l$covnames[i]]])
  • } assign("offsmat", l$offsmat) form <- as.formula(l$form, env = environment())

Model 3

Model_test3 <- ergm(form, offset.coef = -Inf,

  • control = control.ergm(MCMC.samplesize = 10000,
  • MCMC.interval = 10000,
  • MCMLE.maxit = 20,
  • parallel = 16)) Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 20: Optimizing with step length 0.0030. The log-likelihood improved by 0.5966. Estimating equations are not within tolerance region. Iteration 2 of at most 20: Optimizing with step length 0.0060. The log-likelihood improved by 2.0915. Estimating equations are not within tolerance region. Iteration 3 of at most 20: Optimizing with step length 0.0061. The log-likelihood improved by 2.6256. Estimating equations are not within tolerance region. Iteration 4 of at most 20: Optimizing with step length 0.0076. The log-likelihood improved by 3.4182. Estimating equations are not within tolerance region. Iteration 5 of at most 20: Optimizing with step length 0.0061. The log-likelihood improved by 2.3109. Estimating equations are not within tolerance region. Iteration 6 of at most 20: Optimizing with step length 0.0061. The log-likelihood improved by 2.4201. Estimating equations are not within tolerance region. Iteration 7 of at most 20: Optimizing with step length 0.0061. The log-likelihood improved by 1.9109. Estimating equations are not within tolerance region. Iteration 8 of at most 20: Optimizing with step length 0.0061. The log-likelihood improved by 2.6047. Estimating equations are not within tolerance region. Iteration 9 of at most 20: Optimizing with step length 0.0120. The log-likelihood improved by 3.4043. Estimating equations are not within tolerance region. Iteration 10 of at most 20: Optimizing with step length 0.0107. The log-likelihood improved by 3.0744. Estimating equations are not within tolerance region. Iteration 11 of at most 20: Optimizing with step length 0.0107. The log-likelihood improved by 3.0243. Estimating equations are not within tolerance region. Iteration 12 of at most 20: Optimizing with step length 0.0092. The log-likelihood improved by 3.6081. Estimating equations are not within tolerance region. Iteration 13 of at most 20: Optimizing with step length 0.0106. The log-likelihood improved by 2.6254. Estimating equations are not within tolerance region. Iteration 14 of at most 20: Optimizing with step length 0.0076. The log-likelihood improved by 2.1927. Estimating equations are not within tolerance region. Iteration 15 of at most 20: Optimizing with step length 0.0076. The log-likelihood improved by 2.1411. Estimating equations are not within tolerance region. Iteration 16 of at most 20: Optimizing with step length 0.0091. The log-likelihood improved by 1.6381. Estimating equations are not within tolerance region. Iteration 17 of at most 20: Optimizing with step length 0.0106. The log-likelihood improved by 2.3282. Estimating equations are not within tolerance region. Iteration 18 of at most 20: Optimizing with step length 0.0152. The log-likelihood improved by 2.9578. Estimating equations are not within tolerance region. Iteration 19 of at most 20: Optimizing with step length 0.0123. The log-likelihood improved by 1.9232. Estimating equations are not within tolerance region. Iteration 20 of at most 20: Optimizing with step length 0.0092. The log-likelihood improved by 1.5345. Estimating equations are not within tolerance region. MCMLE estimation did not converge after 20 iterations. The estimated coefficients may not be accurate. Estimation may be resumed by passing the coefficients as initial values; see 'init' under ?control.ergm for details. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...

caught segfault address 0x2b8e802de000, cause 'memory not mapped'

Traceback: 1: (function (lprec) { .Call(RlpSolve_delete_lp, lprec) invisible(lprec)})(<pointer: 0x5612a644d2d0>) An irrecoverable exception occurred. R is aborting now ...

leifeld commented 2 years ago

This looks like the model you are trying to estimate is degenerate, perhaps because it is not a good fit for the data you observe. I don't know why it throws the error instead of saying it's degenerate, but perhaps the error goes away if the model is specified more in line with the underlying population process. At any rate, you can see now that it's not an issue with the btergm package but rather something that happens when you use the ergm function, so I'll close the issue here.

smy310 commented 2 years ago

I have limited the number of Iterations to 20 as an example, default must be 60.

The data I run with default of MCMLE.maxit can converge.

But it must a problem with the codes in ergm but not btergm.

Thanks a lot!

smy310 commented 2 years ago

Thank you very much for the explanation that I can do in the next step.

Since my paper was asked for robustness check for MPLE, I would like to use MCMC to estimate as one possible check. Maybe I can try tbergm instead of mtergm.

获取 Outlook for iOShttps://aka.ms/o0ukef


发件人: Philip Leifeld @.> 发送时间: Saturday, March 26, 2022 11:00:58 PM 收件人: leifeld/btergm @.> 抄送: smy310 @.>; Author @.> 主题: Re: [leifeld/btergm] error while running mtergm (Issue #30)

You can return the data from the mtergm call by using the returndata argument, like this:

mtergm(..., returndata = TRUE)

The respective code in the mtergm functionhttps://github.com/leifeld/btergm/blob/5eeef1c053245025a721d17588a0c640e3cab436/R/mtergm.R#L283 shows you that after the part where the data are returned the function would basically just take the different elements of the list of objects and run the ergm function on them, possibly using a slightly adapted version of the formula you provided if necessary. So you could return the data and run those steps by yourself to verify that this is actually a problem that happens when the ergm package is used. If so, perhaps it would be worth providing it as a minimal self-contained example to the ergm authors and get their input on how to fix it. If you struggle with adjusting the formula, you can also just run the tergmprepare function manually, which will give you the data and the formula, just like in the code above the line marked herehttps://github.com/leifeld/btergm/blob/5eeef1c053245025a721d17588a0c640e3cab436/R/mtergm.R#L283.

The error message sounds like it could either be some memory problem in the ergm code or, more likely, some issue with parallel processing, as per the link you provided. It cannot originate in btergm itself because there is no compiled code in the package, and the error message indicates a problem with some compiled code. Maybe the supercomputer is using different computing nodes that have different versions of the ergm package or some other dependency of ergm or btergm? Or maybe some of the nodes built the package on a different version of R? It may be worth checking the version numbers on all cores. And/or consult the system administrator for advice on this.

That said, I am not entirely sure you actually need mtergm because it gets fairly complicated to estimate with so many time steps. As per the article about btergm in the Journal of Statistical Software and earlier work by Cranmer and Desmarais, bootstrapped MPLE should likely be effective with 23 time steps, so you should be able to use the btergm function for this.

There is also a newer estimation wrapper in the btergm package that wasn't covered in the JSS article, which is the tbergm function. It uses Bayesian estimation from the Bergm package. Maybe that could eliminate the problem if you are not keen to use the btergm function. Or it may be subject to the same problem if it is caused by an ergm (or dependency) installation issue as per the above.

― Reply to this email directly, view it on GitHubhttps://github.com/leifeld/btergm/issues/30#issuecomment-1079711018, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQHYRWB5FQT5BNKBB7SOFA3VB4RCVANCNFSM5RUVC7FA. You are receiving this because you authored the thread.Message ID: @.***>