Closed leotrs closed 2 years ago
Some thoughts on implementing the ideas above:
If we implemented largest_connected_component
as a class method, I would envision the guts looking something like this (modifying the original hypergraph):
def largest_connected_component(self):
connected_nodes = max(xgi.connected_components(self), key=len)
self.remove_nodes_from(set(H.nodes).difference(connected_nodes))
or (creating a new hypergraph)
def largest_connected_component(self):
connected_nodes = max(xgi.connected_components(self), key=len)
return self.subhypergraph(connected_nodes).copy()
Alternatively, we could write a method residing in connected.py looking something like this:
def largest_connected_component(H, return_hypergraph=True):
connected_nodes = max(xgi.connected_components(self), key=len)
if return_hypergraph:
return H.subhypergraph(connected_nodes).copy()
else:
return connected_nodes
Thoughts?
I guess creating a new one would be desirable. Regarding where the method should reside I don't really have a preference. A class method would be more in line with what Leo was proposing, right?
Personally I find the decision of where to put certain functions totally arbitrary in the case of networkx. I lean toward putting more stuff in the hypergraph class because it allows for nicer chaining.. but chaining doesn't allow to return a new object..
I would also in general would want to create a new hypergraph. Having the option not to might save memory when needed? I would initially have it as a function in connected.py. That's where I would look first as a user, and having too many methods in the class might make it less tidy. We could also implement it in connected.py and then have a method in the class call it, so it could be called in both ways (networkx often does that I think).
So basically, a combination between option 1 and 3 maybe
def largest_connected_component(H, in_place=False):
connected_nodes = max(xgi.connected_components(self), key=len)
if not in_place:
return H.subhypergraph(connected_nodes).copy()
else:
return self.remove_nodes_from(set(H.nodes).difference(connected_nodes))
Alright let's put it in connected.py then!
I wrote my answer before seeing yours.. :) Maybe putting it in both can be a nice best-of-both worlds (hopefully worst-of-).
I would also in general would want to create a new hypergraph. Having the option not to might save memory when needed? I would initially have it as a function in connected.py. That's where I would look first as a user, and having too many methods in the class might make it less tidy. We could also implement it in connected.py and then have a method in the class call it, so it could be called in both ways (networkx often does that I think).
So basically, a combination between option 1 and 3 maybe
def largest_connected_component(H, in_place=False): connected_nodes = max(xgi.connected_components(self), key=len) if not in_place: return H.subhypergraph(connected_nodes).copy() else: return self.remove_nodes_from(set(H.nodes).difference(connected_nodes))
I implemented this on the largest_component
branch with the minor changes of replacing self
with H
and removing return
in the else statement.
One question that I have is that perhaps this method should be called largest_connected_hypergraph
to indicate that we're not returning a node set, which I would associate with "component"
+1 for distinguishing nodesets and hypergraphs by using component and hypergraph
Sounds good.
Graph generators may sometimes return hypergraphs with isolated nodes (e.g. ER hypergraph with small
p
). In these cases, and others, it is usual to obtain the largest component. Ideally, we would like to do something like this: