statrs-dev / statrs

Statistical computation library for Rust
https://docs.rs/statrs/latest/statrs/
MIT License
544 stars 78 forks source link

Implement: Convolution for probability distributions #204

Open tessob opened 3 months ago

tessob commented 3 months ago

Hi!

I would like to propose an extension of this crate with implementation of Convolution of probability distributions for all distributions which supports this operation.

Within my project I already have such implementation based on following traits applied to several distributions:

trait IdenticalConvolutions {
    type Multiplier;
    type Distribution;

    /// The convolution/sum of two independent identically distributed random variables.
    fn identical_convolutions(&self, times: Self::Multiplier) -> Result<Self::Distribution>;
}

trait ConvolutionWith<D> {
    type Distribution;

    /// The convolution/sum of two independent random variables.
    fn convolution_with(&self, other: D) -> Result<Self::Distribution>;
}

Examples of implementations:

impl IdenticalConvolutions for Gamma {
    type Multiplier = f64;
    type Distribution = Gamma;

    fn identical_convolutions(&self, times: f64) -> Result<Gamma> {
        if times <= 0.0 {
            return Err(Model("The multiplier must be greater than zero!".to_string()));
        };

        Gamma::new(self.shape() * times, self.rate()).into()
    }
}

impl ConvolutionWith<Gamma> for Gamma {
    type Distribution = Gamma;

    fn convolution_with(&self, other: Gamma) -> Result<Gamma> {
        if self.rate() != other.rate() {
            return Err(Model(
                "Convolution of two Gamma distributions requires that both have the same rate/scale parameter!"
                    .to_string(),
            ));
        };

        Gamma::new(self.shape() + other.shape(), self.rate()).into()
    }
}

It seems to me that these traits are general enough to cover all available cases. Your thoughts?

I can contribute into this, but only after somebody put a placeholder/example.

YeungOnion commented 2 months ago

I've looked at this request occasionally and been unsure how to approach it. On one hand, this allows users of the crate to rely on closed form convolutions without looking them up elsewhere and choose the appropriate parameters for a different distribution.

It effectively offers a dynamic dispatch option for the sentiment below,

I want to create a distribution that is described by the sum of N random variables.

I accept that the returned distribution is not of the same family as any of the given distributions, but I want it to work like a distribution.

Since all our (discrete|continuous) distribution structs in the distribution module impl the same set of traits, it would be nice to have some kind of supertrait that expresses "this type has this whole API for its respective support".


I've recently been thinking about a similar sentiment for another convenience which would behave as an optimization.

Some distribution (families) are special cases of more general distributions, it would be useful if the constructor for distribution A could return distribution B which is an optimized special case of A.

A trivial example might be zero width Uniform, zero width Triangle, and zero width Gaussian, all of these converge to a Dirac distribution. More usefully might be StudentsT distribution which can be the Cauchy or the Normal distribution depending on DOF.

Now I'm not 100% sure these are the same problem, but from the lens of traits/interfaces, they appear to be at least up to how they're constructed, new as it stands won't cut it.

tessob commented 2 months ago

@YeungOnion You can create a new repository branch specifically for this topic, and when I have time, I will prepare an implementation to this branch for all the respective distributions. Then we can discuss something more tangible.

Such features are useful within my current project, which is centered around stochastic process modeling. However, I only use a few distributions from this crate. I just think that contributing such functionality might be helpful for someone else.

Possible toy use case: Observing a frog jumping. As a modeling distribution we can take Gamma one as a frog never jumping on negative distance and a distance itself is a real number. Using convolutions, we can ask: what distance is covered by 3 jumps, or what would be the probability that 4.2 meters is covered by 3 jumps.

YeungOnion commented 1 month ago

I opened a branch feat-convolution