Closed rachithaiyappa closed 4 months ago
Do you identify which feature can be a bottleneck? Shortest paths?
Algorithm | Time Complexity | Source |
---|---|---|
nx.average_clustering(G) | ||
nx.triangles(G) | ||
nx.pagerank(G) | ||
nx.pagerank(G, personalization=hot_vec_copy) | ||
nx.average_neighbor_degree(G) | ||
nx.eigenvector_centrality(G,tol=toler) | ||
nx.resource_allocation_index(G,edge_s) | ||
nx.jaccard_coefficient(G,edge_s) | ||
nx.katz_centrality_numpy(G) | ||
nx.adamic_adar_index(G,edge_s) | ||
nx.preferential_attachment(G,edge_s) | ||
nx.degree_assortativity_coefficient(G) | ||
nx.common_neighbors(G,edge_s[ee][0], edge_s[ee][1]) | ||
nx.shortest_path_length(G) | ||
nx.betweenness_centrality(G, normalized=True) | ||
nx.diameter(G) | ||
nx.transitivity(G) |
Quick update...for a network with ~7k nodes and 50k edges, these are the naive time taken in seconds to calculate some of the features. The bottlenecks become more evident here. I'll make a better plot and post here across a few other networks to decide which feature we can drop since there it is now very evident to me that the script with networkx is inefficient.
I wonder how the authors of the original work managed to use it for their analysis :O
Here are plot from 4 of the smaller networks (<50k edges)
Here is another way of looking at the same
Since all topological feaures in creat_numpy_files are obtained using networkx, perhaps it might be useful to remove some of them to make the code faster?
Then we'd need to decide which one's to remove