Replace RandomState.permutation with RandomState.shuffle

resolves #35

This change addresses 2 problems with RandomState.permutation: (1) it's slow due to unncessary allocation, copy and conversions (2) it munges mixed-type iterables, turning iterables of Tuples to nested arrays and converting all values to a common type

Performance

I tested this fix with the web-Google network from snap. This network has 875k nodes and 5.1M edges. Here is the test code:

import networkx
import community
import cProfile
import pickle

def test_community(graph, random_seed):
    profile = cProfile.Profile()
    profile.enable()

    best_part = community.best_partition(graph, random_state=random_seed)

    profile.disable()
    profile.dump_stats("profile_new2.log")

    with open('new2.pickle', 'wb') as f:
        pickle.dump(best_part, f)

if __name__ == '__main__':
    g = networkx.read_edgelist("web-Google.txt", nodetype=int)

    print("nodes %d" % len(g))

    test_community(g, 123)

Here are the results for Python 2 and Python 3 as tested on Ubuntu 18.04.2 LTS.

Version	Before	After
3.6.7	2579s	732s
2.7.15rc1	1735s	615s

Correctness

In addition to the unit tests, the following script was used to ensure that results were identical to current master (the pickle files are generated by the above test script):

import pickle
import sys

if __name__ == '__main__':
    old_pickle_file, new_pickle_file = sys.argv[1:]
    old_result = pickle.load(open(old_pickle_file, "rb"))
    new_result = pickle.load(open(new_pickle_file, "rb"))
    assert old_result == new_result

taynaud / python-louvain

Replace RandomState.permutation with RandomState.shuffle #46

Performance

Correctness