Motivation

Some of these are non-trivial distributions we didn't really test before.

To validate solution of #1476.

Details

Single-element weighted sampling is simple enough.

fn choose_two_iterator is also simple enough: there are no weights, so we can just assign each pair of results a unique index in the list of 100 * 99 / 2 possibilities (nothing that we sort pairs since the order of chosen elements is not specified).

fn choose_two_weighted_indexed gets a bit more complicated; I choose to approach it by building a table for the CDF of size num*num including impossible variants. Most of the tests don't pass, so there must be a mistake here.

Aside: using let key = rng.random::<f64>().ln() / weight; (src/seq/index.rs:392) may help with #1476 but does not fix the above.

dhardy commented 3 days ago

I can confirm that choose_multiple_weighted has a significant problem, since sampling two elements from 0, 1, 2 with weights 1, 1/2, 1/3 a million times and sorting yields 532298 counts of (0, 1), 338524 counts of (0, 2) and 129178 counts of (1, 2). (Unlike #1476, this example does not require very small weights.)

This is sampling without replacement, so expected samples are:

(0,1) or (1, 0): 531818
(0, 2) or (2, 0): 339393
(1, 2) or (2, 1): 128788

dhardy commented 3 days ago

I fixed my calculation of the CDF, found a variant which failed like #1476, fixed this by taking the logarithm of keys, and applied some optimisation to the Efraimidis-Spirakis algorithm.

rust-random / rand

Add KS tests for weighted sampling #1530

Motivation

Details