taynaud / python-louvain

Louvain Community Detection
BSD 3-Clause "New" or "Revised" License
965 stars 200 forks source link

MemoryError on large graphs #60

Closed GillesVandewiele closed 4 years ago

GillesVandewiele commented 4 years ago

We have been getting MemoryErrors when applying the community detection on large graphs (a few hundred thousands of nodes). The problem appeared to be in the np.random.permutation call. We managed to fix this by applying a dirty patch which switches the np.random.permutation to use itertools.permutations:

import numpy as np
import community
import itertools

def check_random_state(seed):
    return np.random
community.community_louvain.check_random_state = check_random_state
np.random.permutation = lambda x: next(itertools.permutations(x))

This is of course far from a clean solution, as the random_state is no longer used.

taynaud commented 4 years ago

Hello,

I hope your issue is solved with https://github.com/taynaud/python-louvain/pull/46

I will make a release soon to include this patch.

Keep in mind that python, networkx and this package are not very memory efficient and you may need alternative for very large graphs

GillesVandewiele commented 4 years ago

It definitely looks like it. Moreover, my patch will always return the same permutation (the first element of the generator, which is deterministic). The proposed solution will always return a different solution, which is probably better.