taynaud / python-louvain

Louvain Community Detection
BSD 3-Clause "New" or "Revised" License
965 stars 200 forks source link

Replace RandomState.permutation with RandomState.shuffle #46

Closed pbourke closed 4 years ago

pbourke commented 5 years ago

resolves #35

This change addresses 2 problems with RandomState.permutation: (1) it's slow due to unncessary allocation, copy and conversions (2) it munges mixed-type iterables, turning iterables of Tuples to nested arrays and converting all values to a common type

Performance

I tested this fix with the web-Google network from snap. This network has 875k nodes and 5.1M edges. Here is the test code:

import networkx
import community
import cProfile
import pickle

def test_community(graph, random_seed):
    profile = cProfile.Profile()
    profile.enable()

    best_part = community.best_partition(graph, random_state=random_seed)

    profile.disable()
    profile.dump_stats("profile_new2.log")

    with open('new2.pickle', 'wb') as f:
        pickle.dump(best_part, f)

if __name__ == '__main__':
    g = networkx.read_edgelist("web-Google.txt", nodetype=int)

    print("nodes %d" % len(g))

    test_community(g, 123)

Here are the results for Python 2 and Python 3 as tested on Ubuntu 18.04.2 LTS.

Version Before After
3.6.7 2579s 732s
2.7.15rc1 1735s 615s

Correctness

In addition to the unit tests, the following script was used to ensure that results were identical to current master (the pickle files are generated by the above test script):

import pickle
import sys

if __name__ == '__main__':
    old_pickle_file, new_pickle_file = sys.argv[1:]
    old_result = pickle.load(open(old_pickle_file, "rb"))
    new_result = pickle.load(open(new_pickle_file, "rb"))
    assert old_result == new_result
taynaud commented 4 years ago

Thank you very much for your contribution