rust-random / rand

A Rust library for random number generation.

https://crates.io/crates/rand

Other

1.67k stars 432 forks source link

Zipf: let n have type F #1518

Closed dhardy closed 1 week ago

dhardy commented 4 weeks ago

[x] Added a CHANGELOG.md entry

Summary

Change the parameter type of Zipf's n to F

Motivation

https://github.com/rust-random/rand/issues/1323#issuecomment-2125324653

Details

The CDF test fails:

---- zipf stdout ----
KS statistic: 0.13359213049244018
Critical value: 0.00195
thread 'zipf' panicked at rand_distr/tests/cdf.rs:119:5:
assertion failed: ks_statistic < critical_value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The value stability tests (only 4 samples; bottom of zipf.rs) did not fail.

benjamin-lieser commented 4 weeks ago

You cannot use the continuous test, because the cdf is not continuous. It does work, if you replace test_continuous(seed as u64, dist, |k| cdf(k, n, x)); with test_discrete(seed as u64, dist, |k| cdf(k as f64, n, x));

dhardy commented 4 weeks ago

This should be mostly ready, but do we want to make this change?

Also note: #1517 added a note about casting results to ints being safe as a result of the input bounding the output; the input can now be larger too.

benjamin-lieser commented 4 weeks ago

I am not sure if we should do it. I do not have some quantitative proof for it, but I doubt that for values bigger u64::MAX there is a measurable difference to the Zeta distribution. So if such big values are desired there is already a solution, albeit less elegant because the user has to do the case distinction. I guess most users will use Zipf with smaller integers, but to be honest I never really had a need myself, so this is a very vague guess.

Another point might be the name. If you search for Zipf, you fill find mostly the Zeta distribution, scipy has our Zipf as Zipfian. Maybe this would be a better name.

Edit: I guess there is still a difference to Zeta, because even for 2**64 there is significant mass in the tail of the harmonic series. And also Zeta does not support s=1. So there is definitely a potential usecase for this.

dhardy commented 4 weeks ago

I'll wait for @vks to comment.

benjamin-lieser commented 4 weeks ago

Looks good, but I'm a bit confused why there is a test failure.

Which test failure?

Do you have an opinion on the name? Zipf vs Zipfian