Consider implementing "Algorithms for Generating Small Random Samples" in index::sample_array

TheIronBorn commented 4 months ago

Summary

The recent paper Algorithms for Generating Small Random Samples describes two small, simple algorithms for generating unique pairs and triples without replacement. With the new const-time sample_array this would be a zero-cost optimization. Note that this will likely be a value-breaking change, therefore this should land before sample_array is released.

Details

I propose we add checks to sample_array for N == 2 and N == 3 and run the faster algorithms in those cases.

Motivation

Generating unique pairs and triples is quite common and often fairly frequently executed in many algorithms. I myself have independently developed the pairs algorithm to optimize a hot loop. The author, Cicirello, mentions evolutionary algorithms as a motivation. The median pivot selection rule and others of sorting algorithms are other common uses. Some random distribution algorithms require unique pairs. Given the simplicity of this addition, it is worthwhile.

dhardy commented 4 months ago

Our current implementation is based on Floyd's algorithm (hard to find a source; see here). It should be roughly comparable on performance and it fully shuffles the result.

Also... how does something that simple count as a scientific publication?

dhardy commented 4 months ago

Note: if you care about the performance difference and don't need a shuffled result, then try a copy of the current sample_array but without shuffling (so do: indices[i] = t; continue; inside the if). Benchmark that against the current version and the paper's algorithms.

rust-random / rand