statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

Shorter name for NodematchFilter()? #480

Open krivit opened 1 year ago

krivit commented 1 year ago

In light of https://github.com/statnet/ergm/issues/478, NodematchFilter() operator, which I had originally put together to test the API, may be worth making more prominent, and that might involve renaming it to something more concise. Some candidates, in no particular order:

  1. FNodematch()
  2. NodematchF()
  3. FMatch()
  4. MatchF()
  5. Nodematch()*
  6. Match()*
  7. OnMatch()
  8. OnNodematch()

* --- These candidates may be error prone in that they differ from the match() and nodematch() term by capitalisation alone.

Any preferences? @sgoodreau , @martinamorris , @CarterButts , @drh20drh20 , @handcock , @chad-klumb

martinamorris commented 1 year ago

it's long, but i'm still leaning towards

  1. OnNodematch()
chad-klumb commented 1 year ago

If it's a special case of the F operator I'd be inclined to just use that operator instead (possibly spelled out to Filter).

If it needs its own name, 1, 2, or their analogues with F replaced by Filter would be my preferences.

sgoodreau commented 1 year ago

I don't feel like I have much sense of the whole background to give the best feedback. It sounds like "filter" exists as a term in the new statnet packages for the concept of including only parts of the network (via the F operator, but also as a general idea). In that case, it seems good to include F or Filter. Is the "on" prefix already in there somewhere as well, conceptually?

Definitely do not make it just Match or Nodematch

krivit commented 1 year ago

Thanks for the feedback! The arguments for having a separate term are two:

  1. Separate term is more optimised, in that the general F has to keep track of two models, one for the filtering, the other for the evaluation. It makes sense to optimise common special cases. That having been said, this can be accomplished by detecting the node match case and branching to optimised code for that.
  2. Brevity: if the call is shortened, the special case would be shorter and cleaner:
      NodematchF(~gwesp, "a")
      F(~gwesp, ~nodematch("a"))
martinamorris commented 1 year ago

That having been said, this can be accomplished by detecting the node match case and branching to optimised code for that.

I'm now a bit confused about what the options are. But if the above means you can use the existing "F" operator syntax, rather than creating a new operator for this special case, and still get the optimised code, then that seems like the best soln to me.

krivit commented 1 year ago

I'm now a bit confused about what the options are. But if the above means you can use the existing "F" operator syntax, rather than creating a new operator for this special case, and still get the optimised code, then that seems like the best soln to me.

NodematchFilter() is already there and in fact precedes the more general F() operator chronologically, so I guess one of the options is to deprecate it in favour of F().

The question is, then, from a user interface perspective, is there benefit to having a shortcut for a common case?

martinamorris commented 1 year ago

Shorter by 3 characters? (if you're referring to the 2 options you show a couple of comments back). I'm thinking not worth it, as long as the optimised code is still used for this case.

krivit commented 1 year ago

Fewer parentheses, too. :-)

I'll see about the special case.

CarterButts commented 1 year ago

Probably best to avoid having official terms that differ only by capitalization - that is likely to lead to issues.  (Perhaps, if we had a /uniform/ rule that capitalized and uncapitalized term versions differed in some specific way, that would work.  But I don't think that is likely to be a thing.)

I'm not actually sure what NodematchFilter does, but perhaps one can think of the category of thing to which it belongs.  Coming up with a uniform rule for the category seems likely to work out better in the long run....

(Relatedly, there have been a lot of innovations in specifications recently which seem great, and very powerful, but I don't have a good handle on them yet.  My concern is that they are more or less starting to specify a de facto language for term specification (cool), but that the structure of the language is not very transparent and may or may not be future proof (not as cool).  Unfortunately, since I don't have a handle on them, and they seem to be in flux, it's hard for me to make nuanced recommendations right now.  :-(  Not sure what the best one-stop-shopping reference is for the current state of the API....)

On 8/7/22 10:54 PM, Pavel N. Krivitsky wrote:

In light of #478 https://github.com/statnet/ergm/issues/478, |NodematchFilter()| operator, which I had originally put together to test the API, may be worth making more prominent, and that might involve renaming it to something more concise. Some candidates, in no particular order:

  1. |FNodematch()|
  2. |NodematchF()|
  3. |FMatch()|
  4. |MatchF()|
  5. |Nodematch()|*
  6. |Match()|*
  7. |OnMatch()|
  8. |OnNodematch()|
  • --- These candidates may be error prone in that they differ from the |match()| and |nodematch()| term by capitalisation alone.

Any preferences? @sgoodreau https://github.com/sgoodreau , @martinamorris https://github.com/martinamorris , @CarterButts https://github.com/CarterButts , @drh20drh20 https://github.com/drh20drh20 , @handcock https://github.com/handcock , @chad-klumb https://github.com/chad-klumb

— Reply to this email directly, view it on GitHub https://github.com/statnet/ergm/issues/480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJM3GC33WG5XD2ZCBTS6X3VYCOJ3ANCNFSM5532337A. You are receiving this because you were mentioned.Message ID: @.***>

krivit commented 1 year ago

Probably best to avoid having official terms that differ only by capitalization - that is likely to lead to issues.

The pattern so far is that ordinary terms are lowercase and snake-cased whereas term operators (i.e., terms that take ergm formulas as arguments) are capitalised and camel-cased. So far, the only naming conflict in effect is Sum() the linear combination operator and sum() the valued term that sums all dyads.

In retrospect, perhaps we should have named Sum() something else (LinCom()?).

NodematchFilter(formula, attr) evaluates formula on a network constructed by taking the LHS network and removing all edges which don't match on attr.

I wouldn't say that we are specifying a language above and beyond what we've already done with basic ergm terms. The only "API" here is that during my sabbatical in 2017 I implemented an API that enabled C change statistics to store and update their internal states, and then I realised that there was no rule against passing an ergm model to an ergm term, both on the R and on the C side, which means that one could implement terms (operators) that wrapped other terms, modifying their inputs (the network), their outputs (the change scores), and their parametrisation (the curved mapping).

That API has evolved since then, but fundamentally that's all there is to it. At this point, we are discussing specific operators and their user interfaces.

The most up to date API document is in a series of ergm package vignettes, particularly the ones in master.