Closed YuhuYang closed 4 months ago
Thanks for the question @YuhuYang.
For a hypergraph H
, you can compute all shortest path lenghts with spl = xgi.shortest_path_length(H)
.
Note: this gives and infinite length for disconnected nodes.
You can then just collect the lengths and take the average like so:
lens = []
for tup in spl:
lens += tup[1].values()
# remove lengths 0 for self-loops
lens = [i for i in lens if i!=0]
# replace inf by 0 for disconnected nodes
lens = [0 if i==np.inf else i for i in lens]
avg_shortest_path = np.sum(lens) / (N * (N-1)) # = np.mean(lens)
You can find other global measures by exploring this page. You can find assortativity measures, clustering coefficient, or the density for example.
Thanks for the question @YuhuYang.
For a hypergraph
H
, you can compute all shortest path lenghts withspl = xgi.shortest_path_length(H)
. Note: this gives and infinite length for disconnected nodes.You can then just collect the lengths and take the average like so:
lens = [] for tup in spl: lens += tup[1].values() # remove lengths 0 for self-loops lens = [i for i in lens if i!=0] # replace inf by 0 for disconnected nodes lens = [0 if i==np.inf else i for i in lens] avg_shortest_path = np.sum(lens) / (N * (N-1)) # = np.mean(lens)
You can find other global measures by exploring this page. You can find assortativity measures, clustering coefficient, or the density for example.
Thank you very much!
Note for self: not closing because we could put this a recipe or new function.
However, the code costs much time (now, about 30 minutes). The num of nodes is around 6000. Is this situation normal? (I have used the largest connected component)
Can you check what part is taking time? I expect it is the shortest_path_length()
function. Computing shortest paths is expensive in general. Optimization is on our list for the future.
Yes, its is the shortest_path_length()
. Are there some methods can decrease the time cost?
Hi @YuhuYang! Computing the shortest past between two nodes is a computationally complex task, so it's normal for it to take some time.
In XGI I think we have implemented a version of Dijkstra's algorithm which has a computational complexity of O(N^2) --N here is the number of nodes-- for a single node.
shortest_path_length()
does this for every node in your structure so you will have something O(N^3) which is quite slow.
In addition to this polynomial cost of the procedure, XGI is implemented in python which is notably slower than other programming languages.
My suggestions to decrease the computational cost are:
single_source_shortest_path_length()
for every node in your structure but using different cores of your machine. You can do this using for example the multiprocessing package.graph_tool
(see here for the corresponding function) or reticula
(here).I hope this helps! Happy coding!
Thanks again! I'll try them!
2. Use libraries for networks which are implemented in C++ or other faster languages and then wrapped in python, notable examples are
graph_tool
(see here for the corresponding function) orreticula
(here).
graph_tool
is not a library for higher-order structures (I am less familiar with reticula
), so you would need to take the skeleton of your hypergraph/simplicial complex and compute the paths on the corresponding graph.
how to calculate global indicators of hypergraphs? Like the average path length, diameter