xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.
https://xgi.readthedocs.io
Other
177 stars 27 forks source link

Rules for default return values #215

Open leotrs opened 1 year ago

leotrs commented 1 year ago

When computing the density of a Hypergraph, there are many possibilities. For example, compute the density of all edges, of edges of a particular order, of edges up to a particular order, or the density of the incidence matrix. The current default is to compute the density of all edges. What should be the default?

During today's all-team call, it was pointed out by @acuschwarze that returning a single number may not be the most natural thing to do since one of the main reasons for researchers to use higher-order structures is precisely that they allow to compute most quantities at different edge orders or sizes. So perhaps the default should be to return a collection of values, say, a list containing the densities at each individual order.

Note that the particular example of density is one that involves more than one function (namely xgi.density and xgi.incidence_density). Another example is the degree of a single node which can also be computed for individual edge orders (e.g. by H.nodes.degree(order=d)), as well as many other NodeStats one could imagine.

Please use this issue to share your general views on this point. Note the discussion is not so much about what should be the default return type (e.g. a list vs an array) as it is about what the default return should be (e.g. one number vs many numbers).

Some questions to stir a debate:

  1. What would a user expect the default behavior to be? What would be the most useful and/or the least unexpected?
  2. Should we use the same or different rules for the return values of functions that compute a quantity over an entire network (e.g. density) vs those that compute a quantity for each node (e.g. degree)?
  3. How does this affect efficiency? (It may be the case that computing a quantity (say density) at one order may help us compute the same quantity at a different order. This means that returning the per-order density can be done more efficiently than simply executing the same density function once per order.)

Note the question "What should the default return value be?" has popped up recently in different parts of the codebase:

maximelucas commented 1 year ago

I thought briefly about this.

I think I would have the return type to be "one order", but not having a default order. So the user would be forced to specify, say order=2. They could also specify order=None or "all" or whatever to have an aggregate quantity (e.g. for density) over all orders, but it would not be the default.

leotrs commented 1 year ago

I see your point RE "no good default value" and tend to agree.

Perhaps we can make it so that all of the relevant functions accept a single parameter like

This parameter would be required and have no default value so we force the user to explicitly declare what they want to get.

maximelucas commented 1 year ago

Something like that seems good yes.

I'm just thinking it might look a bit funny implementation-wise with functions looking like:

def measure(order):
    if isinstance(order, int): 
        # compute single order
    elif order=="all":
        for order in orders:
             # compute single order, and store

where the code to computing the single order could almost be a separate function to avoid redundancy.

leotrs commented 1 year ago

This can be easily fixed with a decorator. Though the "agg" option would require some more thinking...