Open leogm99 opened 1 year ago
betweenness_centrality
Using cProfile
to profile betweenness_centrality
, we get roughly the following output:
51195879 function calls (51195875 primitive calls) in 25.198 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 <class 'networkx.utils.decorators.argmap'> compilation 8:1(<module>)
1 0.000 0.000 25.198 25.198 <class 'networkx.utils.decorators.argmap'> compilation 8:1(argmap_betweenness_centrality_5)
1 0.000 0.000 0.000 0.000 <string>:1(<lambda>)
1 0.157 0.157 25.197 25.197 betweenness.py:14(betweenness_centrality)
1514 17.814 0.012 21.810 0.014 betweenness.py:256(_single_source_shortest_path_basic)
1514 3.078 0.002 3.230 0.002 betweenness.py:317(_accumulate_basic)
1 0.000 0.000 0.000 0.000 betweenness.py:359(_rescale)
1 0.000 0.000 25.198 25.198 cProfile.py:106(runcall)
4560216 0.367 0.000 0.367 0.000 coreviews.py:44(__init__)
2280108 0.286 0.000 0.403 0.000 coreviews.py:50(__iter__)
2280108 0.646 0.000 0.818 0.000 coreviews.py:81(__getitem__)
1 0.000 0.000 0.000 0.000 decorators.py:1032(get_name)
2 0.000 0.000 0.000 0.000 decorators.py:1077(<genexpr>)
1 0.000 0.000 0.000 0.000 decorators.py:1083(signature)
8/5 0.000 0.000 0.000 0.000 decorators.py:1188(_flatten)
5 0.000 0.000 0.000 0.000 decorators.py:1218(_indent)
1 0.000 0.000 0.000 0.000 decorators.py:740(_lazy_compile)
1 0.000 0.000 25.198 25.198 decorators.py:815(func)
4 0.000 0.000 0.000 0.000 decorators.py:850(_count)
3 0.000 0.000 0.000 0.000 decorators.py:873(_name)
1 0.000 0.000 0.000 0.000 decorators.py:893(compile)
1 0.000 0.000 0.000 0.000 decorators.py:941(assemble)
6 0.000 0.000 0.000 0.000 enum.py:359(__call__)
6 0.000 0.000 0.000 0.000 enum.py:678(__new__)
1 0.000 0.000 0.000 0.000 functools.py:421(_unwrap_partial)
1 0.000 0.000 0.000 0.000 graph.py:1458(is_directed)
2280108 0.464 0.000 0.659 0.000 graph.py:338(adj)
3030 0.001 0.000 0.001 0.000 graph.py:398(__iter__)
1 0.000 0.000 0.000 0.000 graph.py:430(__len__)
2280108 0.642 0.000 2.119 0.000 graph.py:452(__getitem__)
1 0.000 0.000 0.000 0.000 inspect.py:199(ismethod)
1 0.000 0.000 0.000 0.000 inspect.py:2280(_signature_from_function)
1 0.000 0.000 0.000 0.000 inspect.py:2375(_signature_from_callable)
6 0.000 0.000 0.000 0.000 inspect.py:2637(__init__)
24 0.000 0.000 0.000 0.000 inspect.py:2687(name)
6 0.000 0.000 0.000 0.000 inspect.py:2699(kind)
3 0.000 0.000 0.000 0.000 inspect.py:277(isfunction)
1 0.000 0.000 0.000 0.000 inspect.py:290(_has_code_flag)
1 0.000 0.000 0.000 0.000 inspect.py:2920(__init__)
7 0.000 0.000 0.000 0.000 inspect.py:2969(<genexpr>)
1 0.000 0.000 0.000 0.000 inspect.py:2998(from_callable)
1 0.000 0.000 0.000 0.000 inspect.py:3006(parameters)
1 0.000 0.000 0.000 0.000 inspect.py:301(isgeneratorfunction)
1 0.000 0.000 0.000 0.000 inspect.py:3252(signature)
1 0.000 0.000 0.000 0.000 inspect.py:66(get_annotations)
1 0.000 0.000 0.000 0.000 misc.py:560(create_py_random_state)
3 0.000 0.000 0.000 0.000 re.py:202(sub)
3 0.000 0.000 0.000 0.000 re.py:288(_compile)
1 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x748520}
2 0.000 0.000 0.000 0.000 {built-in method builtins.callable}
1 0.000 0.000 0.000 0.000 {built-in method builtins.compile}
1 0.000 0.000 0.000 0.000 {built-in method builtins.exec}
3 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
4 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
7 0.000 0.000 0.000 0.000 {built-in method builtins.id}
30 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
2283138 0.118 0.000 0.118 0.000 {built-in method builtins.iter}
3/2 0.000 0.000 0.000 0.000 {built-in method builtins.len}
3029 0.139 0.000 0.139 0.000 {built-in method fromkeys}
3 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
2278594 0.092 0.000 0.092 0.000 {method 'append' of 'collections.deque' objects}
28384002 1.207 0.000 1.207 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'enable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
7 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
6 0.000 0.000 0.000 0.000 {method 'isidentifier' of 'str' objects}
3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
2280108 0.078 0.000 0.078 0.000 {method 'pop' of 'list' objects}
2280108 0.108 0.000 0.108 0.000 {method 'popleft' of 'collections.deque' objects}
3 0.000 0.000 0.000 0.000 {method 'sub' of 're.Pattern' objects}
2 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'values' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'values' of 'mappingproxy' objects}
Most of the time is spent calculating shortest paths (_single_source_shortest_path_basic
). The method is called 1514 times (one time for each node). As this does not modify the internal structures in any way, it can be trivially parallelized to optimize performance in larger graphs, which is what rustworkx
does (although it should be noted that the overhead of spawning a process/thread should be taken into account).
betweenness_centrality
.With rustworkx
, things are a bit different:
9 function calls in 0.313 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.313 0.313 __init__.py:1593(_graph_betweenness_centrality)
1 0.000 0.000 0.313 0.313 cProfile.py:106(runcall)
1 0.000 0.000 0.000 0.000 functools.py:818(dispatch)
1 0.000 0.000 0.313 0.313 functools.py:884(wrapper)
1 0.000 0.000 0.000 0.000 weakref.py:415(__getitem__)
1 0.000 0.000 0.000 0.000 weakref.py:428(__setitem__)
1 0.313 0.313 0.313 0.313 {graph_betweenness_centrality}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'enable' of '_lsprof.Profiler' objects}
The time taken is much shorter, but, cProfile
can only profile python bindings, so extensions in C/C++/Rust are not getting profiled, which is the case here (graph_betweenness_centrality
is an extension method written in pure rust.
Timer unit: 1e-06 s
Total time: 39.2648 s
File: /home/leonardo/Escritorio/fiuba/TPF/TPF/networkx/networkx/algorithms/centrality/betweenness.py
Function: __betweenness_centrality at line 126
Line # Hits Time Per Hit % Time Line Contents
==============================================================
126 @profile
127 def __betweenness_centrality(G, k=None, normalized=True, weight=None, endpoints=False, seed=None):
128 1 68.0 68.0 0.0 betweenness = dict.fromkeys(G, 0.0) # b[v]=0 for v in G
129 1 0.4 0.4 0.0 if k is None:
130 1 0.7 0.7 0.0 nodes = G
131 else:
132 nodes = seed.sample(list(G.nodes()), k)
133 1515 343.1 0.2 0.0 for s in nodes:
134 # single source shortest paths
135 1514 190.5 0.1 0.0 if weight is None: # use BFS
136 1514 33463809.1 22102.9 85.2 S, P, sigma, _ = _single_source_shortest_path_basic(G, s)
137 else: # use Dijkstra's algorithm
138 S, P, sigma, _ = _single_source_dijkstra_path_basic(G, s, weight)
139 # accumulation
140 1514 301.1 0.2 0.0 if endpoints:
141 betweenness, _ = _accumulate_endpoints(betweenness, S, P, sigma, s)
142 else:
143 1514 5799926.1 3830.9 14.8 betweenness, _ = _accumulate_basic(betweenness, S, P, sigma, s)
144 # rescaling
145 2 196.3 98.2 0.0 betweenness = _rescale(
146 1 0.1 0.1 0.0 betweenness,
147 1 4.4 4.4 0.0 len(G),
148 1 0.1 0.1 0.0 normalized=normalized,
149 1 2.0 2.0 0.0 directed=G.is_directed(),
150 1 0.1 0.1 0.0 k=k,
151 1 0.1 0.1 0.0 endpoints=endpoints,
152 )
Maybe some time could be saved by moving, for example, the conditional checks on endpoints
and weight
, as those do not change in the scope of this function, though the benefit is not big, at least in a small network (it should be taken into account that the conditional check is tested for each node)
Total time: 70.9561 s
File: /home/leonardo/Escritorio/fiuba/TPF/TPF/networkx/networkx/algorithms/centrality/betweenness.py
Function: _single_source_shortest_path_basic at line 252
Line # Hits Time Per Hit % Time Line Contents
==============================================================
252 @profile
253 def _single_source_shortest_path_basic(G, s):
254 1514 469.2 0.3 0.0 S = []
255 1514 395.9 0.3 0.0 P = {}
256 2293710 215275.5 0.1 0.3 for v in G:
257 2292196 798710.8 0.3 1.1 P[v] = []
258 1514 64391.6 42.5 0.1 sigma = dict.fromkeys(G, 0.0) # sigma[v]=0 for v in G
259 1514 289.7 0.2 0.0 D = {}
260 1514 367.6 0.2 0.0 sigma[s] = 1.0
261 1514 299.9 0.2 0.0 D[s] = 0
262 1514 2036.9 1.3 0.0 Q = deque([s])
263 2281622 249359.9 0.1 0.4 while Q: # use BFS to find shortest paths
264 2280108 388632.3 0.2 0.5 v = Q.popleft()
265 2280108 400446.4 0.2 0.6 S.append(v)
266 2280108 293466.5 0.1 0.4 Dv = D[v]
267 2280108 374580.5 0.2 0.5 sigmav = sigma[v]
268 148786356 17826830.3 0.1 25.1 for w in G[v]:
269 146506248 16833100.2 0.1 23.7 if w not in D:
270 2278594 354157.0 0.2 0.5 Q.append(w)
271 2278594 355115.3 0.2 0.5 D[w] = Dv + 1
272 146506248 22132219.3 0.2 31.2 if D[w] == Dv + 1: # this is a shortest path, count paths
273 26103876 4981974.7 0.2 7.0 sigma[w] += sigmav
274 26103876 5683436.5 0.2 8.0 P[w].append(v) # predecessors
275 1514 554.2 0.4 0.0 return S, P, sigma, D
CPU: AMD Ryzen 5 5600G RAM: 2x8gb 3200Mhz
4:30-5 hours
Using
perftest
,lineprof
or else, track the performance of some algorithms in NetworkX. Then compare results using the same algorithms in RustworkX.Maybe the best course of action would be to use the same performance tools that the NetworkX community uses.