taynaud / python-louvain

Louvain Community Detection
BSD 3-Clause "New" or "Revised" License
955 stars 199 forks source link

Different partitions on Python 2.7 vs Python 3 and one bug #41

Closed skatsikeas closed 4 years ago

skatsikeas commented 5 years ago

I am running the same code in both Python 2.7 and Python 3.7 and although I get the same communities, the characteristics of the communities differ. For example, the size (in nodes) of each community is smaller on Python 3.7 compared to 2.7. Is there an explanation behind this? Please note that I am using v0.11 of python-louvain.

I also believe that I have found a bug on v0.13: since randomize on best_partition() is a boolean, why is the default value 'None'? https://github.com/taynaud/python-louvain/blob/381b7db8196f43de98d5279746173b50fbb2bea9/community/community_louvain.py#L161-L166

The reason why I do not use v0.13 is because on each run I get different communities and I believe the above bug is to blame.

Any feedback on those two issues would be really helpful for me.

dalwar23 commented 5 years ago

I didn't check this out on python3.7 yet but at the top of my head, I can"try" to answer one of your questions - Why the best_partition() functions argument randomize boolean defaults to None?

This a more pythonic way to write a function with flexible choices. Even when the argument is a boolean it has at least 2 choices - yes or no / 0 or 1. So, to generalize the function setting it to any of those two values will hamper the flexibility of the function itself.

There could be some conditions where this particular argument needs to bet at NONE becaue this choice doesn't effect what the user is trying to achive but there could be a case where the user needs to set it to true or false.

So basically, to give the user more flexibility and make the code more efficient this argument is set to None.

If you have any other ideas why this was done this way please feel free to share! Thanks!

skatsikeas commented 5 years ago

Thank you @dharif23 for your explanation. I would also like to note that if I use randomize=False (not None!) and random_state=None on version 0.13 I do get the same results every time. And if I compare the results I get from v0.11 with the results from v0.13, they differ.

taynaud commented 4 years ago

Randomize is deprecated and should not be used. If you do not want randomization in latest versions, use random_state=0 (or any constant number).

Louvain method is not deterministic, so it is normal to get different partitions but you should get very close modularity.