Distribution sampling for Zipf and Binomial is too slow

I've written a small benchmark (how meta) to highlight the problem.

Here are the results:

##########################################################################################
# Distributions inverse CDF benchmark
##########################################################################################

Cardinality    Uniform        Binomial       Normal         Zipf           Pareto         
------------------------------------------------------------------------------------------
1000           2009.893ms     49049.008ms    1557.039ms     830303.17ms    14294.921ms    
10000          1293.16ms      46005.583ms    1168.293ms     833500.448ms   14307.119ms    
100000         1299.423ms     46028.881ms    1156.179ms     824108.238ms   14183.577ms    
1000000        1276.722ms     45786.145ms    1127.982ms     831676.369ms   14477.197ms    
10000000       1264.654ms     46654.514ms    1142.256ms     835947.764ms   14302.145ms    
100000000      1320.517ms     47075.169ms    1165.2ms       835370.14ms    14216.341ms

Uniform and Normal distributions are fastest with roughly the same values, Pareto is ~ factor 10 slower, followed Binomial (~ 40x slower), and Zipf (~ 650x slower).

I therefore propose to stick with the continuous probabilities and suitably discretize the inverse CDF values in order to approximate their discrete counterparts. The relationship between these pairs of distributions is explained in [1,2].

[1] Relationship between Binomial and Normal Distributions [2] Zipf, Power-laws, and Pareto - a ranking tutorial

peelframework / peel

Distribution sampling for Zipf and Binomial is too slow #82