xgi-org / xgi

CompleX Group Interactions (XGI) is a Python package for higher-order networks.
https://xgi.readthedocs.io
Other
171 stars 27 forks source link

how to calculate the average path length? #511

Closed YuhuYang closed 4 months ago

YuhuYang commented 4 months ago

how to calculate global indicators of hypergraphs? Like the average path length, diameter

maximelucas commented 4 months ago

Thanks for the question @YuhuYang.

For a hypergraph H, you can compute all shortest path lenghts with spl = xgi.shortest_path_length(H). Note: this gives and infinite length for disconnected nodes.

You can then just collect the lengths and take the average like so:

lens = []
for tup in spl:
    lens += tup[1].values()

# remove lengths 0 for self-loops
lens = [i for i in lens if i!=0] 

# replace inf by 0 for disconnected nodes
lens = [0 if i==np.inf else i for i in lens] 

avg_shortest_path = np.sum(lens) / (N * (N-1)) # = np.mean(lens)

You can find other global measures by exploring this page. You can find assortativity measures, clustering coefficient, or the density for example.

YuhuYang commented 4 months ago

Thanks for the question @YuhuYang.

For a hypergraph H, you can compute all shortest path lenghts with spl = xgi.shortest_path_length(H). Note: this gives and infinite length for disconnected nodes.

You can then just collect the lengths and take the average like so:

lens = []
for tup in spl:
    lens += tup[1].values()

# remove lengths 0 for self-loops
lens = [i for i in lens if i!=0] 

# replace inf by 0 for disconnected nodes
lens = [0 if i==np.inf else i for i in lens] 

avg_shortest_path = np.sum(lens) / (N * (N-1)) # = np.mean(lens)

You can find other global measures by exploring this page. You can find assortativity measures, clustering coefficient, or the density for example.

Thank you very much!

maximelucas commented 4 months ago

Note for self: not closing because we could put this a recipe or new function.

YuhuYang commented 4 months ago

However, the code costs much time (now, about 30 minutes). The num of nodes is around 6000. Is this situation normal? (I have used the largest connected component) image

maximelucas commented 4 months ago

Can you check what part is taking time? I expect it is the shortest_path_length() function. Computing shortest paths is expensive in general. Optimization is on our list for the future.

YuhuYang commented 4 months ago

Yes, its is the shortest_path_length(). Are there some methods can decrease the time cost?

thomasrobiglio commented 4 months ago

Hi @YuhuYang! Computing the shortest past between two nodes is a computationally complex task, so it's normal for it to take some time. In XGI I think we have implemented a version of Dijkstra's algorithm which has a computational complexity of O(N^2) --N here is the number of nodes-- for a single node. shortest_path_length() does this for every node in your structure so you will have something O(N^3) which is quite slow. In addition to this polynomial cost of the procedure, XGI is implemented in python which is notably slower than other programming languages.

My suggestions to decrease the computational cost are:

  1. Parallelize the operation by calling single_source_shortest_path_length() for every node in your structure but using different cores of your machine. You can do this using for example the multiprocessing package.
  2. Use libraries for networks which are implemented in C++ or other faster languages and then wrapped in python, notable examples are graph_tool (see here for the corresponding function) or reticula (here).

I hope this helps! Happy coding!

YuhuYang commented 4 months ago

Thanks again! I'll try them!

thomasrobiglio commented 4 months ago

2. Use libraries for networks which are implemented in C++ or other faster languages and then wrapped in python, notable examples are graph_tool (see here for the corresponding function) or reticula (here).

graph_tool is not a library for higher-order structures (I am less familiar with reticula), so you would need to take the skeleton of your hypergraph/simplicial complex and compute the paths on the corresponding graph.