Open Javran opened 3 years ago
The critique doesn't apply. (emphasis mine)
Let us consider the simplest example (which happens to be the worst case): a sequence of two elements, [a b]. According to the shuffle-by-random-keys algorithm, we pick two binary random numbers,
The Test.QuickCheck.shuffle
however picks the arbitrary full-range Int
s†. Int
range (264) is huge in comparison to the length of lists you will shuffle
. Generation of the same tags is unluckily (the list have to be quite long for birthday paradox to be noticeable). The shuffling is still not perfect, but I doubt you can observe the bias.
† we can be faster by directly choosing Word64
.
That's fair given the huge range. Another argument that I forgot to mention is that an imperative swapping takes O(n) time while sorting takes O(n log n), but again this isn't very significant unless the list is very large.
I honestly don't feel strongly that this is something that needs fixing - just openning an issue and see what are your thoughts on this.
A shuffle generator is introduced by #44, however sort-based shuffling has some drawbacks as described by http://okmij.org/ftp/Haskell/perfect-shuffle.txt (please text search for section "A sort-based algorithm and its critique").
I propose that either (1) we replace
shuffle
with a fair shuffle (for simplicity of the implemenation, "An imperative implementation: swapping" in the same article above with inside ST should be good enough), or (2) introduce another functiontrueShuffle
for pedantic minds like myself.I want to see which one would you prefer and also volunteer myself to work this :)