statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
95 stars 37 forks source link

predict.ergm() on a model with offset with -Inf coefs #179

Closed mbojan closed 3 years ago

mbojan commented 3 years ago

Implement predict.ergm.ego() which will compute predicted probabilities for the pseudo-population network.

mbojan commented 3 years ago

Need to fix statnet/ergm#172 first.

mbojan commented 3 years ago

Done.

martinamorris commented 3 years ago

Thank you!!!

martinamorris commented 3 years ago

Ok, still one problem, but i think we're 99% of the way there.

The offset in the first position is the ppop scaling adjustment.

in ergm.ego, theta(ppop adj) + theta(edges) = in ergm theta(edges).

So, we don't want to just delete that offset, we want to add it back to the edges theta first, then delete it.

Without this, the probabilities are not scaled correctly in the predict function:

> head(pnet)
   tail  head         p
1  3497  7317 0.6867611
2  7272  8277 0.7497954
3  7069 11762 0.4357120
4   435  8813 0.5929972
5  1275  8594 0.0000000
6 10672 10806 0.5315793

those probabilities should be much, much lower in this network with 15K nodes and 7K edges.

i'm tagging both @chad-klumb and @mbojan here, not sure who can get to it first.

chad-klumb commented 3 years ago

A possible alternative would be to use the ergm.formula rather than the formula attached to the ergm.ego object; then no removal or modification should be necessary.

chad-klumb commented 3 years ago

cf. PR statnet/ergm.ego#48

you can remotes::install_github("chad-klumb/ergm.ego@master") to test (until it's merged)

martinamorris commented 3 years ago

thx for the quick response! will try when i'm free (6:30ish)

mbojan commented 3 years ago

I was actually wondering if it should be in fact included...

We could also modify ergmMPLE() to return foffset element computed by ergm.pl() which is an already computed combined "effect" of the offset terms (sum of products of offset terms and their coefficients).

@chad-klumb , can you verify for a model with more than one offset term (plus net netsize adjustment) if the order of the parameter vector is the same as the columns of the "model matrix" (as returned by ergmMPLE())? I recall that for some reason if ergmMPLE() was fed with ergmegofit$ergm.formula the netsize adjustment offset was moved from the first place in the formula.

I will be able to get back to it in 8-10hrs.

chad-klumb commented 3 years ago

It looks like the problem is actually worse than that; it appears to move offsets (not just netsize.adj) to the last columns.

That will need to be changed or compensated for.

mbojan commented 3 years ago

Right. I did not manage yet to pin down why it happens. I think we need a reliable way of mapping formula terms to model matrix columns and to parameters. It will be invaluable in other contexts too.

All in all that means the problem affects ergm too, not only ergm.ego.

chad-klumb commented 3 years ago

The rearrangement happens here:

https://github.com/statnet/ergm/blob/38533f42b4e302f05e905145174bf105270f0cbc/R/predict.ergm.R#L103

chad-klumb commented 3 years ago

nonsimp_update.formula may be the solution

chad-klumb commented 3 years ago

With both chad-klumb/ergm@master and chad-klumb/ergm.ego@master, the following script

library(ergm.ego)
data("faux.magnolia.high")

fmhego <- as.egodata(faux.magnolia.high)

fit <- ergm.ego(fmhego ~ edges
                 + nodefactor("Grade")
                 + offset(nodemix("Grade", levels2=1))
                 + nodematch("Grade")
                 + offset(nodematch("Sex")),
                 offset.coef = c(-1,-2))

debug(predict)

pnet <- predict(fit)

shows that predmat is structured as

Browse[5]> head(predmat)
     tail head offset(netsize.adj) edges nodefactor.Grade.8 nodefactor.Grade.9 nodefactor.Grade.10 nodefactor.Grade.11 nodefactor.Grade.12 offset(mix.Grade.7.7) nodematch.Grade
[1,]  116  267                   1     1                  0                  1                   0                   1                   0                     0               0
[2,]   57 1433                   1     1                  0                  0                   0                   1                   1                     0               0
[3,]  610 1167                   1     1                  0                  0                   1                   0                   0                     0               0
[4,]   38  553                   1     1                  0                  1                   0                   0                   0                     0               0
[5,] 1043 1081                   1     1                  0                  0                   1                   1                   0                     0               0
[6,]  625 1317                   1     1                  1                  1                   0                   0                   0                     0               0
     offset(nodematch.Sex)
[1,]                     0
[2,]                     0
[3,]                     1
[4,]                     0
[5,]                     0
[6,]                     1

which I believe is the desired result, as

Browse[5]> fit$ergm.formula
popnw ~ offset(netsize.adj) + (edges + nodefactor("Grade") + 
    offset(nodemix("Grade", levels2 = 1)) + nodematch("Grade")) + 
    offset(nodematch("Sex"))
<environment: 0x000000000833e530>
Browse[5]> fit$coef
  offset(netsize.adj)                 edges    nodefactor.Grade.8    nodefactor.Grade.9   nodefactor.Grade.10   nodefactor.Grade.11   nodefactor.Grade.12 offset(mix.Grade.7.7) 
           -7.2868764             0.1513628            -0.3380101            -0.7668909            -0.6618398            -0.5076787            -0.5141862            -1.0000000 
      nodematch.Grade offset(nodematch.Sex) 
            3.4419170            -2.0000000 
martinamorris commented 3 years ago

aaaaaand. we're back. this time with a head-scratcher. i believe @chad-klumb 's code is working fine. and predict() without offsets is working fine. and predict() with offsets respects the offsets (i.e., p(tie) for -Inf offset values is 0).

but offsets (esp. -Inf offsets) change the number of possible edges from the number of dyads to something less. in the example below, no same-sex ties will be allowed. this reduces the possible edges to 1/2 of the total dyads (only M-F, and undirected in this case.

i don't think predict takes this into account when calculating the p(tie). it does the right thing in terms of preventing same-sex ties. but doesn't calculate the probability of a tie for the possible ties correctly.

reprex below

data("faux.magnolia.high")

# first without offsets

fit <- ergm(faux.magnolia.high ~ edges
                + nodefactor("Grade")
                + nodematch("Grade")
                + nodematch("Sex"))

pnet <- predict(fit) # data frame

vnet <- bind_rows(tibble(vertex.names=pnet[,1], p=pnet[,3]),
                  tibble(vertex.names=pnet[,2], p=pnet[,3]))

deg <- vnet %>%
  group_by(vertex.names) %>%
  summarize(deg = sum(p))

egos <- ergm.ego::as.egodata(faux.magnolia.high)$egos %>%
  mutate(vertex.names = as.numeric(vertex.names)) %>%
  right_join(deg)

## roughly equal:   stubs/2 = edges

sum(egos$deg)/2
summary(faux.magnolia.high ~ edges)

# now with offset - no samesex ties

fit <- ergm(faux.magnolia.high ~ edges
            + nodefactor("Grade")
            + nodematch("Grade")
            + offset(nodematch("Sex")),
            offset.coef = (-Inf))

pnet <- predict(fit) # data frame

vnet <- bind_rows(tibble(vertex.names=pnet[,1], p=pnet[,3]),
                  tibble(vertex.names=pnet[,2], p=pnet[,3]))

deg <- vnet %>%
  group_by(vertex.names) %>%
  summarize(deg = sum(p))

egos <- ergm.ego::as.egodata(faux.magnolia.high)$egos %>%
  mutate(vertex.names = as.numeric(vertex.names)) %>%
  right_join(deg)

## not even close:  stubs/2 != edges

sum(egos$deg)/2
summary(faux.magnolia.high ~ edges)

## let's look at the expected number of same and opp sex ties

pnet_sex <- fmhego$egos %>%
  select(vertex.names, Sex) %>%
  mutate(vertex.names = as.numeric(vertex.names)) %>%
  right_join(pnet, by = c("vertex.names" = "tail" )) %>%
  mutate(sex_tail = Sex) %>%
  select (-Sex)

pnet_sex <-  fmhego$egos %>%
  select(vertex.names, Sex) %>%
  mutate(vertex.names = as.numeric(vertex.names)) %>%
  right_join(pnet_sex, by = c("vertex.names" = "head" )) %>%
  mutate(sex_head = Sex) %>%
  select (-Sex)

## prediction respects offsets in p(tie) = 0 for samesex ties
## but p(tie) not adjusted accordingly for possible ties

pnet_sex %>%
  group_by(sex_tail, sex_head) %>%
  summarize(count = n(),
            pred = sum(p))

output of that last bit is:

# A tibble: 4 x 4
# Groups:   sex_tail [2]
  sex_tail sex_head  count  pred
  <chr>    <chr>     <int> <dbl>
1 F        F        276228    0 
2 F        M        240949  127.
3 M        F        258065  140.
4 M        M        224814    0 
chad-klumb commented 3 years ago

predict shouldn't need to be adjusted to handle this

The problem is likely that the model is inconsistent with the data and only the MPLE is run by default since the model is dyad-independent.

> summary(faux.magnolia.high ~ nodemix("Sex"))
mix.Sex.F.F mix.Sex.F.M mix.Sex.M.M 
        430         285         259 

while with the offset

> sum(egos$deg)/2
[1] 267.1421

which is consistent.

Forcing the main method

fit <- ergm(faux.magnolia.high ~ edges
            + nodefactor("Grade")
            + nodematch("Grade")
            + offset(nodematch("Sex")),
            offset.coef = (-Inf),
            control=control.ergm(force.main=TRUE))

pnet <- predict(fit) # data frame

vnet <- bind_rows(tibble(vertex.names=pnet[,1], p=pnet[,3]),
                  tibble(vertex.names=pnet[,2], p=pnet[,3]))

deg <- vnet %>%
  group_by(vertex.names) %>%
  summarize(deg = sum(p))

egos <- ergm.ego::as.egodata(faux.magnolia.high)$egos %>%
  mutate(vertex.names = as.numeric(vertex.names)) %>%
  right_join(deg)

## not even close:  stubs/2 != edges

sum(egos$deg)/2
summary(faux.magnolia.high ~ edges)

also gives

> sum(egos$deg)/2
[1] 914.3843
> summary(faux.magnolia.high ~ edges)
edges 
  974 

which again is consistent.

martinamorris commented 3 years ago

thx @chad-klumb. a couple of questions:

  1. offsets are often used in glms, and these are typically dyad-independent models that rely on the estimation algorithm we use for MPLE. i thought that estimate was, in fact, the MLE in the glm setting. so why doesn't the MPLE work properly? shouldn't it still try to match the edge and nodefactor counts by redistributing the forbidden ties to allowed dyads?

  2. i'm getting a similar result -- (sum(predicted edges) != sum(observed edges) -- in my predictions from my MCMLE ergm fits in the WHAMP context. do i understand correctly from your comment above that the MCMLE fits should produce the right predicted degree? if so, then i still have another problem to dx.

chad-klumb commented 3 years ago

predict may need to set maximum number of dyads or dyad types appropriately to ensure all dyads are represented in the result; this may explain your 2. if the return value from predict has fewer rows than your network's dyadcount; this would generally be an issue for larger networks (but has nothing to do with offsets)

dyads with infinite offset effects are omitted from the MPLE:

https://github.com/statnet/ergm/blob/8b00975987aac64f4009279a88aaf51305e40bdb/R/ergm.pl.R#L143-L148

martinamorris commented 3 years ago

Thx. It looks like even in the faux.magnolia.high example, the total dyads returned by predict() is less than n choose 2. Is that because predict maxes out at some fixed value? The # returned dyads is 1,000,056, 1461 choose 2 = 1,066,530. Maybe this explains why the sum(pred edge) < obs(edgecount).

And the number of dyads with n=15K (my WHAMP case) is much larger.

So, do we want to:

  1. Sample from that huge space to get estimates of the predicted tie values. My concern is that the space is large, and some of our subgroups are small, so we'll miss them unless we stratify the sample, and that starts getting complicated.

  2. Bump up the MPLE to sample all possible dyads. I'm not even sure this is possible.

If neither of these options is workable, then we're at a dead end, yes?

--

chad-klumb commented 3 years ago

Since predict propagates the ... you can set the dyad types option as follows:

pnet <- predict(fit, control = control.ergm(MPLE.max.dyad.types=1.2e8))

The default is 1e6. IIRC your network is 15k undirected so you've got under 1.2e8 dyads and hence the number of unique dyad types shouldn't need to exceed that.

If that runs without the warning

Warning message:
In ergm.pl(nw, fd, model, verbose = verbose, control = control,  :
  Too many unique dyads. MPLE is approximate, and MPLE standard errors are suspect.

then you should be good as far as getting all the dyads goes.

If results still don't match expectations after that, then there could be yet another problem.

chad-klumb commented 3 years ago

I've updated the PR on ergm to take all informative dyads automatically; once you upgrade to that version you should not need to set the control argument anymore.

chad-klumb commented 3 years ago

I would also add that since predict adds indices to the formula each (informative) dyad will have its own unique type, so the number of unique dyad types we want to keep is the same as the number of informative dyads.

martinamorris commented 3 years ago

Thx @chad-klumb. I'm guessing the number of indices is small when we're working with factor level covariates, and rises quickly with continuous covariates. Downloading from your repo now, will let you know what happens.

martinamorris commented 3 years ago

@chad-klumb, still getting the wrong edgecount with offsets from your latest ergm commit, though the MPLE warning is gone. but i can see now what it's doing:

> mixingmatrix(faux.magnolia.high, "Sex")
Note:  Marginal totals can be misleading
 for undirected mixing matrices.
    F   M
F 430 285
M 285 259

when you use the offset to prohibit samesex ties, sum(predicted tie) = 285.

> fit <- ergm(faux.magnolia.high ~ edges
+             + nodefactor("Grade")
+             + nodematch("Grade")
+             + offset(nodematch("Sex")),
+             offset.coef = (-Inf))
Starting maximum pseudolikelihood estimation (MPLE):
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Stopping at the initial estimate.
Evaluating log-likelihood at the estimate. 
> pnet <- predict(fit) # data frame
> vnet <- bind_rows(tibble(vertex.names=pnet[,1], p=pnet[,3]),
+                   tibble(vertex.names=pnet[,2], p=pnet[,3]))
> deg <- vnet %>%
+   group_by(vertex.names) %>%
+   summarize(deg = sum(p))
`summarise()` ungrouping output (override with `.groups` argument)
> egos <- ergm.ego::as.egodata(faux.magnolia.high)$egos %>%
+   mutate(vertex.names = as.numeric(vertex.names)) %>%
+   right_join(deg)
Joining, by = "vertex.names"
> sum(egos$deg)/2
[1] 285
> summary(faux.magnolia.high ~ edges)
edges 
  974 

So, is it deleting the forbidden dyads at the start, leaving an edgecount of 285 in the residual dyads for the model fit? If so, this presumably would only delete observed edges if they existed in the forbidden dyads. They do in faux.magnolia. But they don't in my WHAMP networks -- will try those next.

martinamorris commented 3 years ago

on the WHAMP network: the object appears to be too big at some point along the calculation

> pnet <- predict(fit_casl) # data frame
Error in ergm.pl(nw, fd, model, verbose = verbose, control = control,  : 
  long vectors (argument 13) are not supported in .C
chad-klumb commented 3 years ago

It exceeds a limitation of the C API used by master.

It may be possible to get it working with dev. Other issues with exceeding 32 bit limits may also arise.

chad-klumb commented 3 years ago

I have made a minimal set of changes to a dev branch and they superficially appear to be working for a toy model.

The install stack is

remotes::install_github(c("statnet/rle",
                          "statnet/statnet.common",
                          "statnet/network",
                          "statnet/ergm-private@predict_dev_test",
                          "chad-klumb/ergm.ego@master"), dependencies=FALSE, force=TRUE)

An example usage is

require(ergm)

netsize <- 15000

nw <- network.initialize(netsize,dir=F)

nw <- san(nw ~ edges, target.stats=netsize)
nw %v% "attr" <- sample(letters[1:7], netsize, TRUE)

fit <- ergm(nw ~ edges + nodemix("attr", levels2=-1))

pnet <- predict(fit)

No control argument should be needed.

It will take a fair amount of memory and at least several minutes of runtime so I would use the Rstudio servers.

The faux.magnolia example should not be expected to have predicted total # edges = observed total # edges when the MPLE is used as the coefficient estimate for the reasons explained above; the WHAMP stuff presumably should not have the same problem.

mbojan commented 3 years ago

The issue is essentially about ERGMs with offset with ergm.ego-s as a special case of interest. I'm moving the issue to ergm.

mbojan commented 3 years ago

I've updated the PR on ergm to take all informative dyads automatically; once you upgrade to that version you should not need to set the control argument anymore.

@chad-klumb has it been merged by Pavel? I don't think I see it in the history.

mbojan commented 3 years ago

i don't think predict takes this into account when calculating the p(tie). it does the right thing in terms of preventing same-sex ties. but doesn't calculate the probability of a tie for the possible ties correctly.

@martinamorris I'm trying to understand what should be the expected behavior in case of models with offsets and -Inf coefs. A small-scale example so that probs can be worked-out by hand:

# 4 nodes; 6 dyads
net <- network.initialize(4, directed=FALSE)
net %v% "a" <- c(1,1,2,2)

# Within 1: log(20/12); P=20/32=5/8
# Within 2: log(20/12) + 2*log(1/2) = log(20/12 * 1/4) = log(5/12); P=5/17
# Between: log(20/12) + log(1/2) = log(5/6); P=5/11

d1 <- predict(
  net ~ edges + nodefactor("a"),
  theta = c(log(20/12), log(1/2))
)
d1 # as computed above
##   tail head         p
## 1    2    3 0.4545455
## 2    1    4 0.4545455
## 3    3    4 0.2941176
## 4    1    2 0.6250000
## 5    1    3 0.4545455
## 6    2    4 0.4545455
d2 <- predict(
  net ~ edges + nodefactor("a") + nodematch("a"),
  theta = c(log(20/12), log(1/2), -Inf)
) 
d2 # nodematch offset does not affect between group dyads
##   tail head         p
## 1    1    2 0.0000000
## 2    1    3 0.4545455
## 3    1    4 0.4545455
## 4    2    3 0.4545455
## 5    2    4 0.4545455
## 6    3    4 0.0000000
# identical to previous
d3 <- predict(
  net ~ edges + nodefactor("a") + offset(nodematch("a")),
  theta = c(log(20/12), log(1/2), -Inf)
)

# All together now:
dplyr::full_join(d1, d2, by=c("head", "tail")) %>%
  dplyr::full_join(d3, by=c("head", "tail"))
##   tail head       p.x       p.y         p
## 1    2    3 0.4545455 0.4545455 0.4545455
## 2    1    4 0.4545455 0.4545455 0.4545455
## 3    3    4 0.2941176 0.0000000 0.0000000
## 4    1    2 0.6250000 0.0000000 0.0000000
## 5    1    3 0.4545455 0.4545455 0.4545455
## 6    2    4 0.4545455 0.4545455 0.4545455

So the -Inf nodematch offset zeroes the within-group tie probabilities but does not affect anything else because nodematch is 0 for between-group dyads and 0 * -Inf = 0 (on the log scale). Or am I misunderstanding something?

chad-klumb commented 3 years ago

I've updated the PR on ergm to take all informative dyads automatically; once you upgrade to that version you should not need to set the control argument anymore.

@chad-klumb has it been merged by Pavel? I don't think I see it in the history.

I believe it was merged, yes.

mbojan commented 3 years ago

predict may need to set maximum number of dyads or dyad types appropriately to ensure all dyads are represented in the result; this may explain your 2. if the return value from predict has fewer rows than your network's dyadcount; this would generally be an issue for larger networks (but has nothing to do with offsets)

dyads with infinite offset effects are omitted from the MPLE:

https://github.com/statnet/ergm/blob/8b00975987aac64f4009279a88aaf51305e40bdb/R/ergm.pl.R#L143-L148

predict() uses xmat.full so the result always has all the dyads

chad-klumb commented 3 years ago

If MPLE.max.dyad.types is exceeded then the sampling stops, and you won't get all the (free) dyads; it doesn't matter which part of the return value is used.

I assume (and correct me if I'm wrong) that the intent was to predict tie probabilities for all free dyads.

Since indices is a term in the formula passed to the MPLE, every free dyad has its own unique type, so the number of unique dyad types required is the same as the number of free dyads, which can be larger than the default MPLE.max.dyad.types of 1e6.

mbojan commented 3 years ago

If MPLE.max.dyad.types is exceeded then the sampling stops, and you won't get all the (free) dyads; it doesn't matter which part of the return value is used.

Ah, right. You are correct.

I assume (and correct me if I'm wrong) that the intent was to predict tie probabilities for all free dyads.

Yes, the intention is to have all by default.

Since indices is a term in the formula passed to the MPLE, every free dyad has its own unique type, so the number of unique dyad types required is the same as the number of free dyads, which can be larger than the default MPLE.max.dyad.types of 1e6.

Yep.

chad-klumb commented 3 years ago

Also (and maybe this was a point of confusion), the two parts of my response to Martina that you quoted above were responding to different points she had raised; the thing about dyads with infinite offset effects being omitted from the MPLE was addressing her 1. below, while the thing about setting maximum number of dyad types was addressing her 2. below.

thx @chad-klumb. a couple of questions:

  1. offsets are often used in glms, and these are typically dyad-independent models that rely on the estimation algorithm we use for MPLE. i thought that estimate was, in fact, the MLE in the glm setting. so why doesn't the MPLE work properly? shouldn't it still try to match the edge and nodefactor counts by redistributing the forbidden ties to allowed dyads?
  2. i'm getting a similar result -- (sum(predicted edges) != sum(observed edges) -- in my predictions from my MCMLE ergm fits in the WHAMP context. do i understand correctly from your comment above that the MCMLE fits should produce the right predicted degree? if so, then i still have another problem to dx.
mbojan commented 3 years ago

Yes, apparently... Sorry!

chad-klumb commented 3 years ago

No worries; I just wanted to be sure I wasn't confusing anyone (including myself!).

mbojan commented 3 years ago

So to summarise where we stand:

  1. (Ad 1. above) Seems like actually an issue with ergm.pl() or ergmMPLE() as predict.formula() basically digests their output
  2. (Ad 2 above) We could have predict.formula() too look at dyadcount of the supplied network and possibly adjust the MPLE.max.dyad.types control option. Or perhaps more preferably: just check one against the other and give a warning/suggestion to the user to set it manually.
chad-klumb commented 3 years ago

There are still some points to consider regarding predict. I'm putting them here since this is the running predict issue atm. We don't have to do all of this for 3.11, though.

  1. Currently predict does not accept or propagate (from the fits) the constraints; this may lead to incorrect predictions for dyads that are fixed by the constraints formula but not (also) by infinite offsets in the model formula.

    Beyond this there is the issue of dyad-dependent constraints; I believe we do not currently have a dyad-dependent "conditionally free dyads" API, so even if constraints was handled by predict, if someone had e.g. ~bd(maxout=1) but no corresponding offset in the generative model, they may wind up with incorrect predictions for "conditionally structural zeroes".

  2. As currently implemented I don't think there's a way for the user to override the (new) default of getting all the free dyads; we may wish to modify the implementation so the user can pass a different number of dyads to sample, if desired (with all free dyads remaining the default).

  3. In master, limitations of the .C API mean that we can't have (# free dyads) x (# changestats) exceed 32 bit limits. We may wish to check for this before the .C call and give the user an informative error if it's going to occur (since the long vector error from R isn't optimally clear).

    In dev (which uses .Call instead of .C), we can exceed this limit, but we will run into problems with 32 bit variables in the MPLE C code. If the number of free dyads itself exceeds 32 bit limits, we will also need to increase MPLE.samplesize, which defaults to .Machine$integer.max (32 bit max), and change how it's handled by the C code (since it can't be a 32 bit integer type anymore). This is part of a more general problem of some things not working beyond 32 bit limits, even if we may quite reasonably want to exceed those (but fixing that is a 4.0 or beyond issue, not a 3.11 issue).

  4. The MPLE hash table machinery bogs down considerably as the number of unique sampled dyad types approaches the maximum number of unique dyad types. Currently, the maximum number of unique dyads types is capped at the number of free dyads in ergm.pl (note that d is the number of free dyads):

    https://github.com/statnet/ergm/blob/749563012d464899d017c4142b9c90541ff69992/R/ergm.pl.R#L44-L45

    So, in typical predict usage, we will always completely fill the hash table (just as we finish sampling the free dyads), since indices gives each dyad its own unique type. When the number of free dyads is large, things become extremely slow as the final small fraction of free dyads is sampled. Increasing the maximum number of unique dyad types (i.e. the number of rows in the hash table) beyond what is actually needed can avoid this significant slowdown; in a simple test case with 15k nodes and just edges as the generative model, current master had not finished the predict call in 2 hours, while a test version changing only min(...,d) to min(...,2*d) in the quoted code above finished predict in ~ 1 minute. Doubling the number of rows may be more than is needed, but it demonstrates the point.

krivit commented 3 years ago

From my (admittedly fast) read of this thread, the original issue in ergm master has been resolved, so this is no longer a 3.11 release blocker. Is that right?

chad-klumb commented 3 years ago

I believe the original issue has been resolved.

I would consider the linked PRs for inclusion in 3.11; they are small and basically independent of each other, so should be easy to review quickly, and I think they significantly improve performance/user friendliness.

With or without those PRs, I would say this is no longer blocking.

mbojan commented 3 years ago

I think so too.

@chad-klumb perhaps it is worthwhile to put:

There are still some points to consider regarding predict. I'm putting them here since this is the running predict issue atm. We don't have to do all of this for 3.11, though. (...)

in separate issues for further reference down the road?

chad-klumb commented 3 years ago

Okay, I think this issue has run its course so I'm closing it now.

I will post remaining todos elsewhere (possibly in dev) at a later date.