This change addresses 2 problems with RandomState.permutation:
(1) it's slow due to unncessary allocation, copy and conversions
(2) it munges mixed-type iterables, turning iterables of Tuples
to nested arrays and converting all values to a common type
Performance
I tested this fix with the web-Google network from snap. This network has 875k nodes and 5.1M edges. Here is the test code:
import networkx
import community
import cProfile
import pickle
def test_community(graph, random_seed):
profile = cProfile.Profile()
profile.enable()
best_part = community.best_partition(graph, random_state=random_seed)
profile.disable()
profile.dump_stats("profile_new2.log")
with open('new2.pickle', 'wb') as f:
pickle.dump(best_part, f)
if __name__ == '__main__':
g = networkx.read_edgelist("web-Google.txt", nodetype=int)
print("nodes %d" % len(g))
test_community(g, 123)
Here are the results for Python 2 and Python 3 as tested on Ubuntu 18.04.2 LTS.
Version
Before
After
3.6.7
2579s
732s
2.7.15rc1
1735s
615s
Correctness
In addition to the unit tests, the following script was used to ensure that results were identical to current master (the pickle files are generated by the above test script):
resolves #35
This change addresses 2 problems with RandomState.permutation: (1) it's slow due to unncessary allocation, copy and conversions (2) it munges mixed-type iterables, turning iterables of Tuples to nested arrays and converting all values to a common type
Performance
I tested this fix with the web-Google network from snap. This network has 875k nodes and 5.1M edges. Here is the test code:
Here are the results for Python 2 and Python 3 as tested on Ubuntu 18.04.2 LTS.
Correctness
In addition to the unit tests, the following script was used to ensure that results were identical to current master (the pickle files are generated by the above test script):