Open dhardy opened 3 days ago
I can confirm that choose_multiple_weighted
has a significant problem, since sampling two elements from 0, 1, 2
with weights 1, 1/2, 1/3
a million times and sorting yields 532298 counts of (0, 1)
, 338524 counts of (0, 2)
and 129178 counts of (1, 2)
. (Unlike #1476, this example does not require very small weights.)
This is sampling without replacement, so expected samples are:
I fixed my calculation of the CDF, found a variant which failed like #1476, fixed this by taking the logarithm of keys, and applied some optimisation to the Efraimidis-Spirakis algorithm.
CHANGELOG.md
entryMotivation
Some of these are non-trivial distributions we didn't really test before.
To validate solution of #1476.
Details
Single-element weighted sampling is simple enough.
fn choose_two_iterator
is also simple enough: there are no weights, so we can just assign each pair of results a unique index in the list of 100 * 99 / 2 possibilities (nothing that we sort pairs since the order of chosen elements is not specified).fn choose_two_weighted_indexed
gets a bit more complicated; I choose to approach it by building a table for the CDF of sizenum*num
including impossible variants. Most of the tests don't pass, so there must be a mistake here.Aside: using
let key = rng.random::<f64>().ln() / weight;
(src/seq/index.rs:392
) may help with #1476 but does not fix the above.