Meaning of "directed", "undirected", and "bipartite" keywords/concepts.

statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks

Other

98 stars 37 forks source link

Meaning of "directed", "undirected", and "bipartite" keywords/concepts. #409

Open krivit opened 3 years ago

krivit commented 3 years ago

Right now, we use "directed" if a term can work on directed networks and analogously for undirected. On the other hand, we use "bipartite" for terms that only work on bipartite networks. For example,

absdiff: works for everything, has keywords "directed" and "undirected"
b1factor: works for bipartite undirected only, has keywords "bipartite" and "undirected"

This is not very consistent conceptually, and it also produces inconsistent search results:

Searching with keyword "undirected" returns all terms (including bipartite-only) suitable for undirected networks (e.g., both absdiff and b1factor).
Searching with keyword "bipartite" returns terms that work only on bipartite networks (e.g., b1factor but not absdiff).

The question is what should we do?

My sense is that the most common use cases would be something like:

List terms that work for a bipartite (undirected) network.
List terms that work for a unipartite undirected network.
List terms that work for a unipartite directed network.

Here are some ideas:

We could accomplish this by declaring "bipartite" to be a third type of network. Then, absdiff would get all three keywords, whereas b1factor would get only "bipartite". The downside of this is that it's not technically correct and is not future-proof, if we ever decide to implement directed bipartite networks.
We could keep the status quo but also implement some way of specifying logical expressions in the search. For example, ~undirected&!bipartite would include absdiff but not b1factor (i.e., terms that work for undirected unipartite networks), whereas ~undirected would include both (i.e., everything that works for bipartite undirected networks). However, this is cumbersome and counterintuitive.
We could declare that "bipartite" should be used the way "directed" and "undirected" are (i.e., so that absdiff gets the keyword as well) and also implement some way of specifying logical expressions in the search. Then, ~bipartite would get absdiff and b1factor (i.e., terms that work for undirected bipartite networks), but so would ~undirected. We may want to the introduce a keyword "unipartite". (A term that supports both should have both keywords.)

Any thoughts?

@mbojan @CarterButts @martinamorris @drh20drh20 @sgoodreau @handcock

CarterButts commented 3 years ago

Ultimately, there is probably not a way around specifying a logical expression of some sort (i.e., term X works for networks with properties Y & Z, term W works with Y or Z, etc.). Beyond enforced bipartitions, one imagines that eventually we will want loop support, and then there are multiplex terms, hypergraphic terms, etc.

On 11/3/21 4:51 AM, Pavel N. Krivitsky wrote:

Right now, we use "directed" if a term /can/ work on directed networks and analogously for undirected. On the other hand, we use "bipartite" for terms that /only/ work on bipartite networks. For example,

|absdiff|: works for everything, has keywords "directed" and "undirected"

|b1factor|: works for bipartite undirected only, has keywords "bipartite" and "undirected"

This is not very consistent conceptually, and it also produces inconsistent search results:

Searching with keyword "undirected" returns all terms (including bipartite-only) suitable for undirected networks (e.g., both |absdiff| and |b1factor|).

Searching with keyword "bipartite" returns terms that work only on bipartite networks (e.g., |b1factor| but not |absdiff|).

The question is what /should/ we do?

My sense is that the most common use cases would be something like:

List terms that work for a bipartite (undirected) network.

List terms that work for a unipartite undirected network.

List terms that work for a unipartite directed network.

Here are some ideas:

We could accomplish this by declaring "bipartite" to be a third type of network. Then, |absdiff| would get all three keywords, whereas |b1factor| would get only "bipartite". The downside of this is that it's not technically correct and is not future-proof, if we ever decide to implement directed bipartite networks.

We could keep the status quo but also implement some way of specifying logical expressions in the search. For example, |~undirected&!bipartite| would include |absdiff| but not |b1factor| (i.e., terms that work for undirected unipartite networks), whereas |~undirected| would include both (i.e., everything that works for bipartite undirected networks). However, this is cumbersome and counterintuitive.

We could declare that "bipartite" should be used the way "directed" and "undirected" are (i.e., so that |absdiff| gets the keyword as well) and also implement some way of specifying logical expressions in the search. Then, |~bipartite| would get |absdiff| and |b1factor| (i.e., terms that work for undirected bipartite networks), but so would |~undirected|. We may want to the introduce a keyword "unipartite". (A term that supports both should have both keywords.)

Any thoughts?

@mbojan https://github.com/mbojan @CarterButts https://github.com/CarterButts @martinamorris https://github.com/martinamorris @drh20drh20 https://github.com/drh20drh20 @sgoodreau https://github.com/sgoodreau @handcock https://github.com/handcock

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/statnet/ergm/issues/409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJM3GGPZBZ5OL4O4PLSTE3UKEO4DANCNFSM5HIXAX2A. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

krivit commented 3 years ago

@CarterButts, in that case, do you have a preference between 2 and 3?

krivit commented 3 years ago

@CarterButts, or some fourth option?

CarterButts commented 3 years ago

Option two, I think. I admit that I was thinking about it not from a search angle, but an InitErgmTerm angle - how does a term decide if it is pleased with the network on which it has been called? If one can define a logical expression for the term that tests to TRUE in graphs for which the term is safe and FALSE otherwise, then that fixes it.

I've not kept up with the latest on what you and Joyce are doing vis a vis searching for terms, so I don't have as much of a sense of whether this would indeed be cumbersome. Depending on how you are handling this task, it might be possible to support simplified cases with special-case syntax. So (this isn't thought out, caveat emptor) you might have something like:

findErgmTerm(termstr, full=FALSE, bipartite=FALSE, directed=TRUE)

which does something like this:

When full==TRUE, ignores all other terms and uses the logical expression in termstr on all terms in the way we are describing.
When full==FALSE, performs a simple search that uses the supplied flags (here directed and bipartite) in a simplified mode that embodies "logical defaults" that follow typical use cases. So, e.g., we assume that the user doesn't need bipartite support unless bipartite==TRUE, etc. And we can further let NULL or NA be wildcards (admitting anything). We have to spell out whatever that syntax does in the docs, but the point would be that it doesn't have to cover all possible use cases...it just has to be fast/simple for the most common one. Users can always use the full logical interface if they want that power. Also, from an implementation standpoint, the simple cases can be implemented recursively by constructing a proper formula and then calling the function again with full==TRUE (passing the formula in question). That greatly reduces maintenance costs, since there is only ever one real mechanism, and the rest of the interface is cosmetic.

Again, that is neither deeply thought out, nor based on a full understanding of the implementations you have been cooking up - am writing en passant between tasks. So ignore if this is not helpful.

krivit commented 4 months ago

@martinamorris , @CarterButts , what if we introduced additional concepts, e.g., bipartite only, directed only, undirected only, etc.? Then, we could adjust our search functions accomplish everything. Someone would have to go through the docs and populate them.

CarterButts commented 4 months ago

Well, I think my comments above are probably still where I would fall. Seems that we need (1) a sensible language that lets us specify what conditions are needed for a term to function (and a term is allowed iff the conditions are satisfied), (2) a function that evaluates that against a network, and (3) a natural way to invoke it for both InitErgmTerm and ergmTerm? use cases. It might work by having elementary CONDITIONs, along with negators, and, and or. The obvious initial candidate CONDITIONs would be directed, bipartite, and valued. (One could then think about e.g. different kinds of edge values, measurement levels, or other exotica, if one wanted.) We could also have an any condition just to make it trivial to have a term that claims to be universal. If I want to then ask for terms from ergmTerm? that are only for directed bipartite networks, I'd pass directed & bipartite somewhere, and everything evaluating TRUE for that condition would be considered.

Is there a better way?

martinamorris commented 4 months ago

@Pavel Krivitsky @.**> is the idea then to design the search function exclude* the "only" terms when they don't apply? instead of the current include-oriented design?

are there any terms that are dual onlys? like a bipartite directed term?

I like @Carter T. Butts @.***> proposal for a CONDITIONS elementary with logic terms, though I haven't thought through all the implications to see if it works in all cases.

On Sat, Jul 13, 2024 at 7:30 PM CarterButts @.***> wrote:

Well, I think my comments above are probably still where I would fall. Seems that we need (1) a sensible language that lets us specify what conditions are needed for a term to function (and a term is allowed iff the conditions are satisfied), (2) a function that evaluates that against a network, and (3) a natural way to invoke it for both InitErgmTerm and ergmTerm? use cases. It might work by having elementary CONDITIONs, along with negators, and, and or. The obvious initial candidate CONDITIONs would be directed, bipartite, and valued. (One could then think about e.g. different kinds of edge values, measurement levels, or other exotica, if one wanted.) We could also have an any condition just to make it trivial to have a term that claims to be universal. If I want to then ask for terms from ergmTerm? that are only for directed bipartite networks, I'd pass directed & bipartite somewhere, and everything evaluating TRUE for that condition would be considered.

Is there a better way?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/409*issuecomment-2227172514__;Iw!!K-Hz7m0Vt54!mKOotAjyyQqmFrqH-JPHekwe2ASvvxO1Pq8BJZhF3Z14q9o2NvLV1DuZbh8rFT44zs-CQEDVp4tin5yn4Ww42dw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYURZHZ7ZIZ6UC56D73ZMHPDPAVCNFSM6AAAAABK2XHBVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRXGE3TENJRGQ__;!!K-Hz7m0Vt54!mKOotAjyyQqmFrqH-JPHekwe2ASvvxO1Pq8BJZhF3Z14q9o2NvLV1DuZbh8rFT44zs-CQEDVp4tin5ynrj3nTMI$ . You are receiving this because you were mentioned.Message ID: @.***>

krivit commented 4 months ago

On further thought, and in light of https://github.com/statnet/ergm/issues/571, I think the fundamental problem is that of data format. We are relying on help files for the term search (and that's a good thing), but we are not storing sufficient information to tell whether a term can work on a particular network, or at least not storing it consistently.

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected. (Valued terms are a separate family, with a parallel classification.) Any given term can support any combination of these, so there are 3 bits of information we need to represent what the term does and doesn't support. (Actually, it's slightly less since a term that doesn't support any cases makes no sense, and, also, it is rare for a term to support directed and bipartite undirected but not unipartite undirected---but it does make sense for terms such as diff.)

We could, therefore, in principle, "encode" any term's support using only three keywords, but we have to use them in a very particular way: essentially, option 1 from what I described, so that both absdiff and b1cov have bipartite but only absdiff has undirected. I don't know how intuitive this is, though, hence my proposal for additional keywords, which could be processed internally by the search functions.

martinamorris commented 4 months ago

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected.

you're proposing 3 binary keywords: unipartite (Y/N), bipartite (Y/N), directed (Y/N) ?

or is there a value to having something like: partite (uni/bi/both), directed (Y/N/both)?

(Valued terms are a separate family, with a parallel classification.)

and we would use the same logic here?

krivit commented 4 months ago

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected.

you're proposing 3 binary keywords: unipartite (Y/N), bipartite (Y/N), directed (Y/N) ?

This could be one solution, but it has the downside that it's using our current keywords but not the way they are used right now.

or is there a value to having something like: partite (uni/bi/both), directed (Y/N/both)?

Perhaps. We could have a pool of keywords, then have the search logic try to figure out what a given combination actually means.

(Valued terms are a separate family, with a parallel classification.)

and we would use the same logic here?

Whatever we do would transfer automatically.

martinamorris commented 4 months ago

I'm a fan of

anything robust (a pool sounds good -- as long as the logic knows what to do with it)
anything easy to implement

It wouldn't be that hard to go through the 126 or so terms and assign new keywords, if we knew what those keywords were, and what kind of logic will be used to parse them. Not as sure about the other things like operators.

On Wed, Jul 17, 2024 at 8:13 PM Pavel N. Krivitsky @.***> wrote:

There are three types of networks we currently support: unipartite directed, unipartite undirected, and bipartite undirected.

you're proposing 3 binary keywords: unipartite (Y/N), bipartite (Y/N), directed (Y/N) ?

This could be one solution, but it has the downside that it's using our current keywords but not the way they are used right now.

or is there a value to having something like: partite (uni/bi/both), directed (Y/N/both)?

Perhaps. We could have a pool of keywords, then have the search logic try to figure out what a given combination actually means.

(Valued terms are a separate family, with a parallel classification.)

and we would use the same logic here?

Whatever we do would transfer automatically.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/409*issuecomment-2235226857__;Iw!!K-Hz7m0Vt54!mpGOM-XdMGUY3OeytyMiqC5pJ7O4-FlQ1_LyDzxVKsjh4xzT_9no2j8hkCgVUFV38nNbRNy0eyzAsRGAVg-78gw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYS5ELNMIEGQ2U2DM4TZM4XGZAVCNFSM6AAAAABK2XHBVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGIZDMOBVG4__;!!K-Hz7m0Vt54!mpGOM-XdMGUY3OeytyMiqC5pJ7O4-FlQ1_LyDzxVKsjh4xzT_9no2j8hkCgVUFV38nNbRNy0eyzAsRGAbmgV3vE$ . You are receiving this because you were mentioned.Message ID: @.***>