[Feature request] Fair shuffling

nick8325 / quickcheck

Automatic testing of Haskell programs.

Other

705 stars 119 forks source link

[Feature request] Fair shuffling #317

Open Javran opened 3 years ago

Javran commented 3 years ago

A shuffle generator is introduced by #44, however sort-based shuffling has some drawbacks as described by http://okmij.org/ftp/Haskell/perfect-shuffle.txt (please text search for section "A sort-based algorithm and its critique").

I propose that either (1) we replace shuffle with a fair shuffle (for simplicity of the implemenation, "An imperative implementation: swapping" in the same article above with inside ST should be good enough), or (2) introduce another function trueShuffle for pedantic minds like myself.

I want to see which one would you prefer and also volunteer myself to work this :)

phadej commented 3 years ago

The critique doesn't apply. (emphasis mine)

Let us consider the simplest example (which happens to be the worst case): a sequence of two elements, [a b]. According to the shuffle-by-random-keys algorithm, we pick two binary random numbers,

The Test.QuickCheck.shuffle however picks the arbitrary full-range Ints^†. Int range (2⁶⁴) is huge in comparison to the length of lists you will shuffle. Generation of the same tags is unluckily (the list have to be quite long for birthday paradox to be noticeable). The shuffling is still not perfect, but I doubt you can observe the bias.

^† we can be faster by directly choosing Word64.

Javran commented 3 years ago

That's fair given the huge range. Another argument that I forgot to mention is that an imperative swapping takes O(n) time while sorting takes O(n log n), but again this isn't very significant unless the list is very large.

I honestly don't feel strongly that this is something that needs fixing - just openning an issue and see what are your thoughts on this.