Closed pavlin-policar closed 6 years ago
Hello, Yes it would be great, patch are welcomed.
It would be great to raise some error if randomize and random_state are both set !
Logic would be something like: if random = True and random_state = None -> same behavior than current random=True if random = False and random_state = None -> same behavior than current random=False if random = True and random_state != None -> error value if random = False and random_state != None -> set your state like scikit-learn
?
Yes, that sounds good. I'll open a PR within the next couple of days.
Actually, I have one thought on this. If I implement the logic as you stated it above, we'd have to keep the functionality of randomize=True
. This simply iterates over the nodes in order. I guess this was included to enable deterministic results.
But now, we'll be able to set a random seed and this will also enable deterministic results. Is there any reason at all to keep the current behaviour? Using the new random_state
alone can't replicate the current behaviour. In scikit-learn random_state=None
simply means that something random will happen (and the point of this is to introduce consistency). I thought of using a reserved value (e.g. -1) to indicate no shuffling, but this is not something typically done and adds complexity.
So if we want to keep the current behaviour and not add reserved parameter values, we'd have to keep both the randomize
and random_state
parameters. IMO this is not a good idea, since this could lead to confusion. Unless there is a very good reason to keep the current deterministic behaviour (so no shuffling), then it would be better to remove it altogether. That way we can still have determinism by setting a random seed, we have the familiar random_state
parameter present in popular libraries, the code is simpler, and we avoid any confusion from having two parameters.
What do you think?
Currently, there is no way to set and use a fixed random seed. We have only two options: either completely random or completely deterministic. Looking at the code, we can probably set a random seed by setting the global random seed for python's
random
module, but that's not very clean and could affect other parts of our project code.It would be great if we could set a
random_state
parameter, much like scikit-learn allows us to do for randomized algorithms.One concern is backward compatibility. If we add a
random_state
parameter, thenrandomize
no longer makes sense, and should be removed. One possibility would be to keeprandomize
for a while and adding a deprication warning, having it be equivalent torandom_state=None
, and then eventually removing it altogether.Is this something you would be willing to include in the library? If so, I would be willing to implement this.