Open CGMossa opened 3 years ago
Could you please be a bit more specific about what you would like to see implemented? Is it just a few distributions or something more complex?
Personally, I would like to have a truncnorm
implemented. But I think there is a general approach that can be used to implemented, in a way that you define a truncation (a,b)
and then apply that to whatever distribution you want to be truncated.
If it's just that we could add a support function:
pub trait Distribution<T> {
// ...
fn sample_truncated<R: Rng + ?Sized>(&self, rng: &mut R, min: T, max: T) -> T where T: PartialOrd {
loop {
let x = self.sample(rng);
if min <= x && x <= max {
return x;
}
}
}
}
Or even a wrapper:
pub trait Distribution<T> {
// ....
fn truncated(self, min: T, max: T) -> Truncated<Self> where T: PartialOrd { .. }
}
// parameters: (distr, min, max)
pub struct Truncated<T, D: Distribution<T>>(D, T, T);
impl<T, D: Distribution<T>> Distribution<T> for Truncated<T, D> { .. }
Edit: added bounds to Truncated
.
I think either will work fine for now. Maybe the wrapper-type is better; I believe that there are efficient sampling algorithms written for truncated-normal, and maybe a few others, that would serve better if they were used, rather than the rejection-sampling scheme you've presented. But they are a bit tricky, and I don't want to go through them right now.
I look at Julia's implementation of these, see here.
Ah, so this is Julia's implementation for truncated normal with finite μ. It does some computation on the bounds (which could in theory be done earlier when the bounds are available at construction time) then calls a different sampling algorithm.
Any implementing distribution could override the sample_trunctated
method allowing use of a different algorithm, but precomputation on the bounds is not possible.
The Truncated
wrapper should allow the same but only when Rust supports specialisation, which it currently doesn't, therefore it is worse (for now). It should also allow overriding the truncated
constructor method, therefore allowing precomputation on the bounds, with the limitation that the same Truncated
type must be returned.
A third option would be to make the Truncated
type an associated type of the trait, which allows more flexibility, but since we don't wish to add that to Distribution
(traits with associated types are not object safe), it would require a new trait:
pub trait TruncateDistribution<T: PartialOrd>: Distribution<T> {
type Truncated;
fn truncated(self, min: T, max: T) -> Self::Truncated;
}
However, we can't have a default implementation of this trait without specialisation, thus it's arguable that this is any better than simply constructing a new distribution for TruncatedNormal
. In fact, we could add TruncatedNormal
now and the TruncateDistribution
trait later (once Rust supports specialisation).
Which brings us back to the question of what we want to do: add a convenience function for sampling-with-rejection or implement new specific truncated-distribution algorithms? If the latter, then I think the first move would be a PR adding TruncatedNormal
into rand_distr
, if you'd like to contribute.
@vks I think since we would likely never have many truncation-specific variants they should just live in rand_distr
alongside the existing implementations?
I agree that adding TruncatedNormal
for now seems the way to go. I'm not worried about having too many truncated distributions in the root namespace, we can move them to a module (as a breaking change) should it be necessary.
I'm a bit skeptical about the rejection sampling approach for generic distributions, it seems very easy to get high rejection rates. Wouldn't inverse transform sampling work better here?
Conclusion: we will accept a PR adding a TruncatedNormal
distribution to rand_distr
, should someone wish to contribute.
We may revisit the idea of adding a generic support for truncated distributions later, but in general there isn't a whole lot of interest. (There's also the question of how this should work. The trivial answer is rejection sampling. @vks suggested inverse (CDF) transform sampling, but we don't have general representations of (inverse) CDF functions, so can't simply implement this in a wrapper..)
Uhm.. I'm still interested in contributing here (even though it has been years). Was there maybe a list of tasks needed to do this, you mind sharing with me please.
Great! So, the first step should be to implement a truncated normal distribution. That article includes notes on multiple methods of sampling from the distribution.
We should start only with one distribution to focus on how best to implement this. Simply rejecting samples outside the acceptance range is the easiest option, but will run into performance issues if the acceptance region is small or only covers low-probability parts of the distribution, so we should look into alternatives.
Background
What is your motivation? I'm writing a simulation and the truncated distributions is a great convenience for sampling parameters.
What type of application is this? (E.g. cryptography, game, numerical simulation) Statistics, numerical simulation, etc.
Feature request
I would like to have truncated distributions as part of (perhaps)
rand_distr
or it could be a new craterand_trunc
maybe. I'm asking for some guidance here to figure out the best course of action.