Closed TheIronBorn closed 4 months ago
Our current implementation is based on Floyd's algorithm (hard to find a source; see here). It should be roughly comparable on performance and it fully shuffles the result.
Also... how does something that simple count as a scientific publication?
Note: if you care about the performance difference and don't need a shuffled result, then try a copy of the current sample_array
but without shuffling (so do: indices[i] = t; continue;
inside the if
). Benchmark that against the current version and the paper's algorithms.
Summary
The recent paper Algorithms for Generating Small Random Samples describes two small, simple algorithms for generating unique pairs and triples without replacement. With the new const-time
sample_array
this would be a zero-cost optimization. Note that this will likely be a value-breaking change, therefore this should land beforesample_array
is released.Details
I propose we add checks to
sample_array
forN == 2
andN == 3
and run the faster algorithms in those cases.Motivation
Generating unique pairs and triples is quite common and often fairly frequently executed in many algorithms. I myself have independently developed the pairs algorithm to optimize a hot loop. The author, Cicirello, mentions evolutionary algorithms as a motivation. The median pivot selection rule and others of sorting algorithms are other common uses. Some random distribution algorithms require unique pairs. Given the simplicity of this addition, it is worthwhile.