xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.
https://xgi.readthedocs.io
Other
185 stars 30 forks source link

Add functions to get largest connected component #25

Closed leotrs closed 2 years ago

leotrs commented 2 years ago

Graph generators may sometimes return hypergraphs with isolated nodes (e.g. ER hypergraph with small p). In these cases, and others, it is usual to obtain the largest component. Ideally, we would like to do something like this:

G = xgi.erdos_renyi_hypergraph(n, m, p).largest_component()
G = xgi.erdos_renyi_hypergraph(n, m, p).remove_isolates()
nwlandry commented 2 years ago

Some thoughts on implementing the ideas above:

If we implemented largest_connected_component as a class method, I would envision the guts looking something like this (modifying the original hypergraph):

def largest_connected_component(self):
    connected_nodes = max(xgi.connected_components(self), key=len)
    self.remove_nodes_from(set(H.nodes).difference(connected_nodes))

or (creating a new hypergraph)

def largest_connected_component(self):
    connected_nodes = max(xgi.connected_components(self), key=len)
    return self.subhypergraph(connected_nodes).copy()

Alternatively, we could write a method residing in connected.py looking something like this:

def largest_connected_component(H, return_hypergraph=True):
    connected_nodes = max(xgi.connected_components(self), key=len)
    if return_hypergraph:
        return H.subhypergraph(connected_nodes).copy()
    else:
        return connected_nodes

Thoughts?

iaciac commented 2 years ago

I guess creating a new one would be desirable. Regarding where the method should reside I don't really have a preference. A class method would be more in line with what Leo was proposing, right?

leotrs commented 2 years ago

Personally I find the decision of where to put certain functions totally arbitrary in the case of networkx. I lean toward putting more stuff in the hypergraph class because it allows for nicer chaining.. but chaining doesn't allow to return a new object..

maximelucas commented 2 years ago

I would also in general would want to create a new hypergraph. Having the option not to might save memory when needed? I would initially have it as a function in connected.py. That's where I would look first as a user, and having too many methods in the class might make it less tidy. We could also implement it in connected.py and then have a method in the class call it, so it could be called in both ways (networkx often does that I think).

So basically, a combination between option 1 and 3 maybe

def largest_connected_component(H, in_place=False):
    connected_nodes = max(xgi.connected_components(self), key=len)
    if not in_place:
        return H.subhypergraph(connected_nodes).copy()
    else:
        return self.remove_nodes_from(set(H.nodes).difference(connected_nodes))
leotrs commented 2 years ago

Alright let's put it in connected.py then!

maximelucas commented 2 years ago

I wrote my answer before seeing yours.. :) Maybe putting it in both can be a nice best-of-both worlds (hopefully worst-of-).

nwlandry commented 2 years ago

I would also in general would want to create a new hypergraph. Having the option not to might save memory when needed? I would initially have it as a function in connected.py. That's where I would look first as a user, and having too many methods in the class might make it less tidy. We could also implement it in connected.py and then have a method in the class call it, so it could be called in both ways (networkx often does that I think).

So basically, a combination between option 1 and 3 maybe

def largest_connected_component(H, in_place=False):
    connected_nodes = max(xgi.connected_components(self), key=len)
    if not in_place:
        return H.subhypergraph(connected_nodes).copy()
    else:
        return self.remove_nodes_from(set(H.nodes).difference(connected_nodes))

I implemented this on the largest_component branch with the minor changes of replacing self with H and removing return in the else statement.

One question that I have is that perhaps this method should be called largest_connected_hypergraph to indicate that we're not returning a node set, which I would associate with "component"

leotrs commented 2 years ago

+1 for distinguishing nodesets and hypergraphs by using component and hypergraph

maximelucas commented 2 years ago

Sounds good.