rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.65k stars 431 forks source link

Add truncated distributions to `rand` / `rand_distr` / new `rand`-crate #1189

Open CGMossa opened 3 years ago

CGMossa commented 3 years ago

Background

What is your motivation? I'm writing a simulation and the truncated distributions is a great convenience for sampling parameters.

What type of application is this? (E.g. cryptography, game, numerical simulation) Statistics, numerical simulation, etc.

Feature request

I would like to have truncated distributions as part of (perhaps) rand_distr or it could be a new crate rand_trunc maybe. I'm asking for some guidance here to figure out the best course of action.

vks commented 3 years ago

Could you please be a bit more specific about what you would like to see implemented? Is it just a few distributions or something more complex?

CGMossa commented 3 years ago

Personally, I would like to have a truncnorm implemented. But I think there is a general approach that can be used to implemented, in a way that you define a truncation (a,b) and then apply that to whatever distribution you want to be truncated.

dhardy commented 3 years ago

If it's just that we could add a support function:

pub trait Distribution<T> {
    // ...

    fn sample_truncated<R: Rng + ?Sized>(&self, rng: &mut R, min: T, max: T) -> T where T: PartialOrd {
        loop {
            let x = self.sample(rng);
            if min <= x && x <= max {
                return x;
            }
        }
    }
}
dhardy commented 3 years ago

Or even a wrapper:

pub trait Distribution<T> {
    // ....

    fn truncated(self, min: T, max: T) -> Truncated<Self> where T: PartialOrd { .. }
}

// parameters: (distr, min, max)
pub struct Truncated<T, D: Distribution<T>>(D, T, T);
impl<T, D: Distribution<T>> Distribution<T> for Truncated<T, D> { .. }

Edit: added bounds to Truncated.

CGMossa commented 3 years ago

I think either will work fine for now. Maybe the wrapper-type is better; I believe that there are efficient sampling algorithms written for truncated-normal, and maybe a few others, that would serve better if they were used, rather than the rejection-sampling scheme you've presented. But they are a bit tricky, and I don't want to go through them right now.

I look at Julia's implementation of these, see here.

dhardy commented 3 years ago

Ah, so this is Julia's implementation for truncated normal with finite μ. It does some computation on the bounds (which could in theory be done earlier when the bounds are available at construction time) then calls a different sampling algorithm.

Any implementing distribution could override the sample_trunctated method allowing use of a different algorithm, but precomputation on the bounds is not possible.

The Truncated wrapper should allow the same but only when Rust supports specialisation, which it currently doesn't, therefore it is worse (for now). It should also allow overriding the truncated constructor method, therefore allowing precomputation on the bounds, with the limitation that the same Truncated type must be returned.

A third option would be to make the Truncated type an associated type of the trait, which allows more flexibility, but since we don't wish to add that to Distribution (traits with associated types are not object safe), it would require a new trait:

pub trait TruncateDistribution<T: PartialOrd>: Distribution<T> {
    type Truncated;
    fn truncated(self, min: T, max: T) -> Self::Truncated;
}

However, we can't have a default implementation of this trait without specialisation, thus it's arguable that this is any better than simply constructing a new distribution for TruncatedNormal. In fact, we could add TruncatedNormal now and the TruncateDistribution trait later (once Rust supports specialisation).

Which brings us back to the question of what we want to do: add a convenience function for sampling-with-rejection or implement new specific truncated-distribution algorithms? If the latter, then I think the first move would be a PR adding TruncatedNormal into rand_distr, if you'd like to contribute.

@vks I think since we would likely never have many truncation-specific variants they should just live in rand_distr alongside the existing implementations?

vks commented 2 years ago

I agree that adding TruncatedNormal for now seems the way to go. I'm not worried about having too many truncated distributions in the root namespace, we can move them to a module (as a breaking change) should it be necessary.

I'm a bit skeptical about the rejection sampling approach for generic distributions, it seems very easy to get high rejection rates. Wouldn't inverse transform sampling work better here?

dhardy commented 2 months ago

Conclusion: we will accept a PR adding a TruncatedNormal distribution to rand_distr, should someone wish to contribute.

We may revisit the idea of adding a generic support for truncated distributions later, but in general there isn't a whole lot of interest. (There's also the question of how this should work. The trivial answer is rejection sampling. @vks suggested inverse (CDF) transform sampling, but we don't have general representations of (inverse) CDF functions, so can't simply implement this in a wrapper..)

CGMossa commented 2 months ago

Uhm.. I'm still interested in contributing here (even though it has been years). Was there maybe a list of tasks needed to do this, you mind sharing with me please.

dhardy commented 2 months ago

Great! So, the first step should be to implement a truncated normal distribution. That article includes notes on multiple methods of sampling from the distribution.

We should start only with one distribution to focus on how best to implement this. Simply rejecting samples outside the acceptance range is the easiest option, but will run into performance issues if the acceptance region is small or only covers low-probability parts of the distribution, so we should look into alternatives.