statnet / ergm.ego

Fit, Simulate and Diagnose Exponential-Family Random Graph Models to Egocentrically Sampled Network Data https://statnet.org
Other
14 stars 4 forks source link

time consuming and not converge #21

Closed shangyuan232 closed 5 years ago

shangyuan232 commented 5 years ago

Summary of my object is as follows.

summary(cdrHyper.ego) Length Class Mode
egos 47 data.frame list
alters 47 data.frame list
egoWt 5911 -none- numeric
egoIDcol 1 -none- character

My model is: fit.full <- ergm.ego(cdrHyper.ego ~ edges+nodematch("CountryCodeTLD2", diff=TRUE,keep = c(5,9,22,23,26))+nodematch("GenericTLD2", diff=TRUE, keep = c(3,7,9)), control=control.ergm.ego(ppopsize=50000), verbose=T) The SAN part cost almost 2 hours, and it turned to MCMLE after failure.

... ... Starting 1 MCMC iteration of 163840 steps.

1 of 1: Returned from SAN Metropolis-Hastings burnin

SAN Metropolis-Hastings accepted 0.046% of 163840 proposed steps. Finished SAN run 11 ... ... Fitting initial model. Unable to match target stats. Using MCMLE estimation. Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. MPLE covariate matrix has 25 rows. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Density guard set to 1206639 from an initial count of 60075 edges. ... ... Starting MCMLE Optimization... Optimizing with step length 0.0172788349638976. Using lognormal metric (see control.ergm function). Using log-normal approx (no optim) Starting MCMC s.e. computation. The log-likelihood improved by 1.354. MCMLE estimation did not converge after 20 iterations. The estimated coefficients may not be accurate. Estimation may be resumed by passing the coefficients as initial values; see 'init' under ?control.ergm for details. Finished MCMLE. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.

What should I do with this non-convergence?

krivit commented 5 years ago

Do you have heterogeneous ego weights, or are they all equal? If they are equal, try rerunning with ppopsize=5911. If not, you might need to adjust the SAN and ERGM estimation parameters.

shangyuan232 commented 5 years ago

Sorry, what do you mean by heterogeneous ego weights?

krivit commented 5 years ago

If you didn't set the ego weights, then they are all the same. In any case, this doesn't sound like a software bug as much as a use case, so I would suggest subscribing to the Statnet Help list (https://mailman13.u.washington.edu/mailman/listinfo/statnet_help) and asking there.

martinamorris commented 5 years ago

@shangyuan232 can you send the output of a call to summary:

summary(cdrHyper.ego ~ edges+nodematch("CountryCodeTLD2", diff=TRUE,keep = c(5,9,22,23,26))+nodematch("GenericTLD2", diff=TRUE))
shangyuan232 commented 5 years ago

Hi Martina,

Here you go. I reply via email directly, because the results look messy after editing on Github.

========================== Summary of model fit

Formula: cdrHyper.ego ~ edges + nodematch("CountryCodeTLD2", keep = c(5, 9, 22, 23, 26)) + nodematch("GenericTLD2") Iterations: 2 out of 20 Monte Carlo MLE Results: Estimate Std. Error MCMC % z value Pr(>|z|) offset(netsize.adj) -8.68457 0.00000 0 -Inf <1e-04 edges 0.84434 0.08423 0 10.024 <1e-04 nodematch.CountryCodeTLD2 2.71223 0.32855 0 8.255 <1e-04 *** nodematch.GenericTLD2 0.17708 0.09917 0 1.786 0.0742 .

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 The following terms are fixed by offset and are not estimated: offset(netsize.adj)

In addition, results also come out when I delete "diff=T":

========================== Summary of model fit

Formula: cdrHyper.ego ~ edges + nodematch("CountryCodeTLD2", keep = c(5, 9, 22, 23, 26)) + nodematch("GenericTLD2", keep = c(3, 7, 9)) Iterations: 2 out of 20 Monte Carlo MLE Results: Estimate Std. Error MCMC % z value Pr(>|z|) offset(netsize.adj) -8.6846 0.0000 0 -Inf <1e-04 edges 0.8317 0.1026 0 8.109 <1e-04 nodematch.CountryCodeTLD2 2.7877 0.1781 1 15.655 <1e-04 ** nodematch.GenericTLD2 0.2338 0.1122 0 2.083 0.0372

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 The following terms are fixed by offset and are not estimated: offset(netsize.adj)

From: Martina Morris notifications@github.com Reply-To: "statnet/ergm.ego" reply@reply.github.com Date: Thursday, 29 November 2018 at 4:25 am To: "statnet/ergm.ego" ergm.ego@noreply.github.com Cc: Yuanyuan shangyuanyuan00@hotmail.com, Mention mention@noreply.github.com Subject: Re: [statnet/ergm.ego] time consuming and not converge (#21)

@shangyuan232https://github.com/shangyuan232 can you send the output of a call to summary:

summary(cdrHyper.ego ~ edges+nodematch("CountryCodeTLD2", diff=TRUE,keep = c(5,9,22,23,26))+nodematch("GenericTLD2", diff=TRUE))

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/statnet/ergm.ego/issues/21#issuecomment-442532438, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhpIOPqFElyGJ91rlePJC3TP33sNf1nsks5uzscdgaJpZM4YxuQl.

martinamorris commented 5 years ago

Hey there, I'm not asking for a summary of the model fit (note, there's no call to ergm.ego in the command I sent). If you use the command summary(mynet ~ myterms) it will calculate the network statistics in myterms and print out their values. You should always run a summary command before running the model -- it's an important data assessment step.

shangyuan232 commented 5 years ago

@martinamorris Hi Martina, Thank you for pointing out my fault. I didn't know about that, and will pay attention to it from now on. The output is as follows.

summary(cdrHyper.ego ~ edges+nodematch("CountryCodeTLD2", diff=TRUE,keep = c(5,9,22,23,26))+nodematch("GenericTLD2", diff=TRUE)) edges

7608 nodematch.CountryCodeTLD2.15 80 nodematch.CountryCodeTLD2.37 7 nodematch.CountryCodeTLD2.66 65 nodematch.CountryCodeTLD2.68 9 nodematch.CountryCodeTLD2.74 139 nodematch.GenericTLD2.0 340 nodematch.GenericTLD2.2 0 nodematch.GenericTLD2.3 1884 nodematch.GenericTLD2.4 0 nodematch.GenericTLD2.5 0 nodematch.GenericTLD2.8 0 nodematch.GenericTLD2.9 711 nodematch.GenericTLD2.11 0 nodematch.GenericTLD2.12 51 nodematch.GenericTLD2.14 0 nodematch.GenericTLD2.15 0

martinamorris commented 5 years ago

Ok, so the first thing I see is that many of the nodematch terms have a count of 0. ERGM can handle that, but it means you should really think about whether you really want these terms in the model. There is no way to estimate the coef for this boundary case (it's just -Inf, which is what ERGM will print out)

Also, out of 7608 edges it looks like possibly not many are on the diagonal. If that's true, then you also want to think about whether a homophily term is appropriate (and if so, it might be negative).

Try running mixingmatrix on your two attributes and see what that looks like.

Finally, you should pretty much always include the nodefactor("attr") term for every nodematch("attr"). Think of the first as the main effect, and the second as the interaction.

So, I'd recommend going back to the drawing board.

shangyuan232 commented 5 years ago

@martinamorris Hi Martina, Thanks for your impressive suggestions! They are very useful to me!