statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

Add nodal attribute subsetting to gwesp and friends #478

Closed sgoodreau closed 1 year ago

sgoodreau commented 1 year ago

We've had a request from a long-standing friend of the group (Jim Moody) to add attribute subsetting to gwesp, akin to the version in triangle. Yes, he could do it, but in the end, making it consistent involves a lot of terms and is probably best done in-house. The functionality makes good sense, and is in our interest, in that we've been trying to encourage the use of gwesp over triangle for more than a decade, so it's good to have it provide a superset of triangle's functionality.

By "friends" I mean the directed versions, as well as gwdegree and gwdsp and their directed versions. There are probably others I'm missing.

Should have a diff argument with T and F.

@CarterButts I know you've always made the strong case for thinking about triangle closure as an inhomogeneous phenomenon -- do you have aversion of this somewhere already I don't see?

@chad-klumb, @martinamorris, @krivit, @drh20drh20 thoughts?

krivit commented 1 year ago

What do they mean by "attribute subsetting"? Evaluating the terms on an induced subgraph of nodes with a specific attribute value? Or something else?

CarterButts commented 1 year ago

response_container_BBPPID{font-family: initial; font-size:initial; color: initial;} I don't have a version of that, though I do have inhomogeneous 2-stars.  Have to say, though, it gets subtle very fast.  Even for 2-stars, there are several forms.  The easiest thing to do would be to have local ESPs  and put curved families on those.  Local here is in the sense of local triangles - we only count subgraphs where everyone has the same attribute value.  Many more types can be defined, of course, which is rather the issue.... Sent via BlackBerry Hub+ Inbox for Android From: @.: August 2, 2022 10:11 AMTo: @.: @.: @.; @.***: [statnet/ergm] Add nodal attribute subsetting to gwesp and friends (Issue #478)

We've had a request from a long-standing friend of the group (Jim Moody) to add attribute subsetting to gwesp, akin to the version in triangle. Yes, he could do it, but in the end, making it consistent involves a lot of terms and is probably best done in-house. The functionality makes good sense, and is in our interest, in that we've been trying to encourage the use of gwesp over triangle for more than a decade, so it's good to have it provide a superset of triangle's functionality. By "friends" I mean the directed versions, as well as gwdegree and gwdsp and their directed versions. There are probably others I'm missing. Should have a diff argument with T and F. @CarterButts I know you've always made the strong case for thinking about triangle closure as an inhomogeneous phenomenon -- do you have aversion of this somewhere already I don't see? @chad-klumb, @martinamorris, @krivit, @drh20drh20 thoughts?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

handcock commented 1 year ago

Hi Steve,

If you mean statistics like a homogeneous GWESP (i.e., all incident nodes have the same value of an attribute), then I have implemented GWESP, GWDSP and GWDEG. It is a start and can share, if of interest.

Best,

Mark

krivit commented 1 year ago

@handcock , @sgoodreau , that's what I am wondering about. If we are talking about GWESP (or similar) evaluated on an induced subgraph defined by vertices having a certain attribute value, we already have machinery for that with term operators. I believe something like

S(~gwesp, ~a==1)

will evaluate gwesp on a subgraph comprising vertices for which vertex attribute a has value 1.

Doing it for each level requires several terms, i.e.,

S(~gwesp, ~a==1) + S(~gwesp, ~a==2) + S(~gwesp, ~a==3)

but I can see implementing something like a For operator, i.e.,

For(~S(~gwesp, ~a==x), "x", 1:3)

which would be expanded into the above.

krivit commented 1 year ago

I've opened a ticket for a "foreach" operator.

krivit commented 1 year ago

@CarterButts , too.

sgoodreau commented 1 year ago

Thanks all! I tend to forget about the full flexibility of the term operators.

And yes, I meant the cases in which all members of the relevant structure have the same value of the attribute.

So, to make sure I'm understanding correctly: let's say I have a vertex attribute named "group", with values 1:4. And I want to consider the number of edges that are in at least one attribute-homogenous triangle, regardless of what the specific attribute value is. I think that would currently mean combining four uses of S() with one of Sum(), into something like:

~Sum(~S(~gwesp(decay=0, fied=TRUE), ~group==1) + S(~gwesp(decay=0, fied=TRUE), ~group==2) + S(~gwesp(decay=0, fied=TRUE), ~group==3) + S(~gwesp(decay=0, fied=TRUE), ~group==4))

and with the new foreach operator that Pavel just mentioned, this code would be simplifed considerably.

Is that right? I admit I can't quite follow all of the nuance in the ERGM 4.0 paper regarding the use of ~Sum in a case like this.

krivit commented 1 year ago

Almost: you also need to tell Sum to add up everything on the formula:

~Sum("sum"~S(~gwesp(decay=0, fied=TRUE), ~group==1) + S(~gwesp(decay=0, fied=TRUE), ~group==2) + S(~gwesp(decay=0, fied=TRUE), ~group==3) + S(~gwesp(decay=0, fied=TRUE), ~group==4))
sgoodreau commented 1 year ago

Can you explain why the word "sum" appears twice, once capitalized and once not? That I don't quite get. Thanks.

PS You also missed my misspelling of fixed :-)

krivit commented 1 year ago

It tells Sum() to sum up all the items on the formula. By default, it sums up the formulas on the list.

krivit commented 1 year ago

We really should start adding examples to terms. Any volunteers?

Also, can y'all take a look at the https://github.com/statnet/ergm/issues/479 ticket? I have a preliminary implementation, but I would like some feedback on the user interface.

krivit commented 1 year ago

@sgoodreau, actually, if

  1. the statistics are local in the sense that if the network has multiple connected components, the value of the statistic is the sum of its values on each component (true for gwesp), and
  2. what is wanted is the total of the statistics over all levels of a rather than broken down by a,

then there is a simpler and probably more computationally efficient way to do this:

F(~gwesp, ~nodematch("a"))
NodematchFilter(~gwesp, "a")

What these do is start with the LHS network, delete all edges for which nodematch("a") does not hold, and then evaluate gwesp() on that.

If you want a gwesp for each separately, you still need the subgraph method, though perhaps a future version of F()

I'll still add the foreach operator, but if the above works, I'd recommend that.

sgoodreau commented 1 year ago

Wonderful, thanks. We definitely need a gallery of these examples, because I doubt many users will be able to ascertain all of this added flexibility, even after reading the ergm 4.0 paper. I know I couldn't!