statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
97 stars 37 forks source link

Add new isolatededges term #95

Closed sgoodreau closed 5 years ago

sgoodreau commented 5 years ago

Maybe there is a better name for this. What I mean is a count of the number of edges where both nodes have no other edges, i.e. mutual monogamy.

chad-klumb commented 5 years ago

After thinking about this more I'm wondering if we really want strong and weak 2 components.

The definitions I'm using are:

a (directed) graph is strongly connected if any vertex can be reached from any other vertex (via a directed path)

a (directed) graph is weakly connected if the underlying undirected graph is connected

components being maximal subgraphs with the property in question

Steve's original concern seemed to be with whether or not i <-> j would count as an isolated edge (where i and j have no other neighbors), which led to discussion of different types of isolation (like i -> j vs. i <-> j), which led to "strong" vs. "weak" considerations. (Note that arrows indicate directed edges, with i <-> j meaning two directed edges, both i -> j and j -> i.) However, I don't think strong and weak connected components are the sort of distinction we want here. While saying i and j form a weak 2-component does imply that they have a strong sort of isolation, saying they form a strong 2-component doesn't tell you much of anything about how isolated they are: i and j can have arbitrarily many in and out edges to other nodes and still form a strong 2-component themselves. In particular, strong and weak 2-components do not correspond to the distinction between a single edge i -> j and a double edge i <-> j.

I think a distinction more in line with what I understood to be Steve's concerns would be defining two types of isolated edges/dyads in the directed case: i.e. i -> j and j -> i are "strongly isolated", while any of i -> j, j -> i, and i <-> j are weakly isolated (again, assuming no other neighbors in all cases).

Also, figuring out how many strong 2-components you've created or destroyed in toggling a single edge is a PITA; it's not a local thing where you can afford to only look at nearby neighbors, etc. Apparently there are algorithms to count strong components "quickly"

https://en.wikipedia.org/wiki/Strongly_connected_component#Algorithms

but understanding and implementing them in C would not be completely trivial. Restricting attention to 2-components is not very helpful: you may still need to look at an arbitrarily large number of vertices/edges to figure out what's happening with the count of strong 2-components when you toggle a single edge.

If strong and weak components are really what we want then I can work on doing that but it seemed to me that what we really wanted was the option of counting i <-> j as isolated or not, which is accomplished by the dyad distinction described above (and not by strong vs. weak 2-components).

martinamorris commented 5 years ago

thx for the thorough consideration of issues :). coupla thoughts:

  1. This gets at one key question that Steve raised -- are components "embedded". IIUC the version of embedding you're identifying:

i and j can have arbitrarily many in and out edges to other nodes and still form a strong 2-component themselves.

is only an issue for directed graphs. And at the end of the day, our use case is still undirected, where the 2 component is uniquely defined by an isolated dyad. So, the complexities of non-isolated strong 2 components in directed graphs is not a show stopper.

  1. @CarterButts added a fast component calculation algorithm to SNA somewhat recently, and might have some comments relevant to whether we want to tackle adding a general ergm-term for k-components that works for both undirected and directed graphs.

  2. If we decide to go with a more modest undirected only term, i'd still vote for calling it component (even if 2-components were the only ones implemented).

sgoodreau commented 5 years ago

Yes, thanks for the deep thinking on this, Chad.

Martina's right that ultimately the only use case for me personally is the undirected one, so all of the thinking through of directed analogues is about imagining what other people might want someday, and that involves some guesswork.

You're absolutely right, though, that the concept I was originally getting at here was isolated pairs, i.e. those who have some sort of relationship but then neither person has any relationship with anyone else. With that conceptualization, then the directed analogue is indeed not the terms commonly known as strong or weak directed components.

This all gets to the issue that there can be multiple directed analogues for the same undirected measure, depending on how one theorizes it. This happens also with, e.g. triad censuses (i.e. whether one cares about the full distribution of all 16 types or just collapses them into cyclical or transitive triads; both are different ways of generalizing what for undirected networks is just the count of triangles).

I lean towards either of the two options below, in ranked order:

(1) not implementing it all for directed networks right now, in which case we are free to call it components(2) if we wish

(2) implementing a version that is like what Chad says above - a count of the pairs who have at least one directed edge (arc) and who have no other ties to anyone else at all. One could make a strong or weak version if one wished, where you have to be mutual for it to count as strong, or either tie to count as weak. If we go this route, we shouldn't call it components, because we're not matching the common definition for it at this point.

Who wants to make the final decision? :-)

martinamorris commented 5 years ago

To get this into your hands asap, I suppose we can punt the larger issues, and use the term that Chad wrote, name unchanged, for now.

sgoodreau commented 5 years ago

There's actually no rush at this point. But I'm happy to stick with the existing name and term, with one tweak: I think we should take the directed version out and say it only works for undirected for now. The directed version that Chad coded up, IIRC, doesn't map on to any of the ideas we've talked about in this thread, and I'd rather not have the functionality at all than have functionality that we don't really have a good theoretical basis for. We can choose to add some version in later once we've thought about it, without then worrying about backwards compatibility.