statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

Implement a foreach operator. #479

Closed krivit closed 1 year ago

krivit commented 1 year ago

Something like ForEach(formula, counter, list). Here,

The operator would add length(list) copies of formula to the model, with corresponding values from list substituted for all instances of counter.

For example,

~ ForEach(~S(~gwesp, ~a==x), "x", 1:3)

is equivalent to

~ S(~gwesp, ~a==1) + S(~gwesp, ~a==2) + S(~gwesp, ~a==3)

It may make sense to allow list to be a function (of the network) or a rlang-style formula that references the network and returns a list, e.g.,

~ ForEach(~S(~gwesp, ~a==x), "x", ~sort(unique(.%v%"a")))

would substitute the network in place of the dot, thus obtaining the unique levels of vertex attribute "a".

This should be fairly straightforward to implement, thanks to the built-in substitute() function which does literally that, and it shouldn't require any additional C code. I've been thinking about it for a while, but given the request https://github.com/statnet/ergm/issues/478, it may be worth expediting.

Three questions:

  1. Is this worth doing?
  2. What should we name the operator? My thoughts so far are For, ForEach, or Map.
  3. How is the syntax?
krivit commented 1 year ago

A further thought is that there may be some benefit to making the counter the last argument and giving it a default value (e.g., .) for the sake of brevity. E.g.,

~ ForEach(~S(~gwesp, ~a==.), 1:3)

On the other hand, most for loop type statements put the counter variable before the collection of objects over which to iterate.

krivit commented 1 year ago

Some fundamentally different syntaxes are also possible. The following are perfectly legal R expressions that InitErgmTerm.for should catch:

~ for(x in 1:3) S(~gwesp, ~a==x)
~ for(x in 1:3) ~S(~gwesp, ~a==x)

though it has the problem that

~ for(x in 1:3) S(~gwesp, ~a==x) + edges

would include edges in the loop, so the expression would need to be

~ (for(x in 1:3) S(~gwesp, ~a==x)) + edges

Alternatively, we can use argument name for index variable a la the foreach package:

~ ForEach(~S(~gwesp, ~a==x), x = 1:3)

or we can put the index first:

~ ForEach(x = 1:3, ~S(~gwesp, ~a==x))

Any thoughts, @handcock , @CarterButts , @sgoodreau , @martinamorris , @drh20drh20 , @mbojan , @chad-klumb ?

krivit commented 1 year ago

The

~ For(~S(~gwesp, ~a==x), x = 1:3)

syntax is now implemented in the For operator. Multiple loop variables are possible, and arguments can be given in any order. (The ergm formula must be an unnamed argument, and all others must be named.)

krivit commented 1 year ago

For avoidance of doubt, the implementations are all in the statnet/ergm@foreach branch.

chad-klumb commented 1 year ago

Sounds reasonable to me.

krivit commented 1 year ago

Sounds reasonable to me.

Thanks! Which variant?

chad-klumb commented 1 year ago
~For(x = 1:3, ~S(~gwesp, ~a==x))

or

~ForEach(x = 1:3, ~S(~gwesp, ~a==x))

are what I would be most inclined to use.

krivit commented 1 year ago

I'm thinking about the first variant. (The order in which the formula and the iterators can be specified is arbitrary, and multiple iterators are possible, producing nested loops.)

martinamorris commented 1 year ago

also like the first variant.

krivit commented 1 year ago

For removal of ambiguity, I was referring to the two variants in Chad's post. @martinamorris , which variants are you referring to?

martinamorris commented 1 year ago

~For(x = 1:3, ~S(~gwesp, ~a==x))