taynaud / python-louvain

Louvain Community Detection
BSD 3-Clause "New" or "Revised" License
964 stars 200 forks source link

Can't set random state #29

Closed pavlin-policar closed 6 years ago

pavlin-policar commented 6 years ago

Currently, there is no way to set and use a fixed random seed. We have only two options: either completely random or completely deterministic. Looking at the code, we can probably set a random seed by setting the global random seed for python's random module, but that's not very clean and could affect other parts of our project code.

It would be great if we could set a random_state parameter, much like scikit-learn allows us to do for randomized algorithms.

One concern is backward compatibility. If we add a random_state parameter, then randomize no longer makes sense, and should be removed. One possibility would be to keep randomize for a while and adding a deprication warning, having it be equivalent to random_state=None, and then eventually removing it altogether.

Is this something you would be willing to include in the library? If so, I would be willing to implement this.

taynaud commented 6 years ago

Hello, Yes it would be great, patch are welcomed.

It would be great to raise some error if randomize and random_state are both set !

Logic would be something like: if random = True and random_state = None -> same behavior than current random=True if random = False and random_state = None -> same behavior than current random=False if random = True and random_state != None -> error value if random = False and random_state != None -> set your state like scikit-learn

?

pavlin-policar commented 6 years ago

Yes, that sounds good. I'll open a PR within the next couple of days.

pavlin-policar commented 6 years ago

Actually, I have one thought on this. If I implement the logic as you stated it above, we'd have to keep the functionality of randomize=True. This simply iterates over the nodes in order. I guess this was included to enable deterministic results.

But now, we'll be able to set a random seed and this will also enable deterministic results. Is there any reason at all to keep the current behaviour? Using the new random_state alone can't replicate the current behaviour. In scikit-learn random_state=None simply means that something random will happen (and the point of this is to introduce consistency). I thought of using a reserved value (e.g. -1) to indicate no shuffling, but this is not something typically done and adds complexity.

So if we want to keep the current behaviour and not add reserved parameter values, we'd have to keep both the randomize and random_state parameters. IMO this is not a good idea, since this could lead to confusion. Unless there is a very good reason to keep the current deterministic behaviour (so no shuffling), then it would be better to remove it altogether. That way we can still have determinism by setting a random seed, we have the familiar random_state parameter present in popular libraries, the code is simpler, and we avoid any confusion from having two parameters.

What do you think?