Select few network features instead of all?

skojaku / degree-corrected-link-prediction-benchmark

Link prediction

MIT License

3 stars 0 forks source link

Select few network features instead of all? #15

Closed rachithaiyappa closed 4 months ago

rachithaiyappa commented 1 year ago

Since all topological feaures in creat_numpy_files are obtained using networkx, perhaps it might be useful to remove some of them to make the code faster?

Then we'd need to decide which one's to remove

skojaku commented 1 year ago

Do you identify which feature can be a bottleneck? Shortest paths?

rachithaiyappa commented 1 year ago

Algorithm	Time Complexity	Source
nx.average_clustering(G)
nx.triangles(G)
nx.pagerank(G)
nx.pagerank(G, personalization=hot_vec_copy)
nx.average_neighbor_degree(G)
nx.eigenvector_centrality(G,tol=toler)
nx.resource_allocation_index(G,edge_s)
nx.jaccard_coefficient(G,edge_s)
nx.katz_centrality_numpy(G)
nx.adamic_adar_index(G,edge_s)
nx.preferential_attachment(G,edge_s)
nx.degree_assortativity_coefficient(G)
nx.common_neighbors(G,edge_s[ee][0], edge_s[ee][1])
nx.shortest_path_length(G)
nx.betweenness_centrality(G, normalized=True)
nx.diameter(G)
nx.transitivity(G)

rachithaiyappa commented 1 year ago

Quick update...for a network with ~7k nodes and 50k edges, these are the naive time taken in seconds to calculate some of the features. The bottlenecks become more evident here. I'll make a better plot and post here across a few other networks to decide which feature we can drop since there it is now very evident to me that the script with networkx is inefficient.

I wonder how the authors of the original work managed to use it for their analysis :O

rachithaiyappa commented 1 year ago

Here are plot from 4 of the smaller networks (<50k edges)

rachithaiyappa commented 1 year ago

Here is another way of looking at the same