statrs-dev / statrs

Statistical computation library for Rust
https://docs.rs/statrs/latest/statrs/
MIT License
599 stars 84 forks source link

Add `into_params()` to multivariate distributions #273

Closed FreezyLemon closed 2 months ago

FreezyLemon commented 2 months ago

Closes #180.

Multinomial could technically also benefit from something like this, but the parameter is mutated inside Multinomial::new, so there's no easy way to return the original p (reversing the mutation could result in a different p due to floating-point imprecisions).

The into_ naming was chosen due to API guidelines and convention in std.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 93.94%. Comparing base (f7f2960) to head (77fdbe0). Report is 39 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #273 +/- ## ========================================== + Coverage 93.91% 93.94% +0.02% ========================================== Files 53 53 Lines 12269 12316 +47 ========================================== + Hits 11523 11570 +47 Misses 746 746 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

YeungOnion commented 2 months ago

I think this is a good start, but allocations for decompositions (if random sampling) and decomp's inverse are necessary to fully address #180 in regards to reallocation. into_... a tuple of all parameters and precomputed values and new_with_precompute may be a little messy, definitely leaks implementation, and requires logic errors to be upheld by the caller. I realize the last of these is not really an issue.

Can you think of an approach that would keep the buffers for the precomputed values so they can be overwritten on update? I think nalgebra has some support for this without directly using MaybeUninit. I'll look into it as well.

FreezyLemon commented 2 months ago

Can you think of an approach that would keep the buffers for the precomputed values so they can be overwritten on update?

Well, I'd generally assume that distributions are immutable. So anything we want to re-use (both values and/or allocations) would need to be passed back into some API.

It's a bit convoluted but one way that would almost definitely work would be adding something like:

impl<D> MultivariateNormal<D>
/* nalgebra trait bounds */
{
    pub fn new_from_nalgebra(mean: OVector, cov: OMatrix) -> Result<Self, Error> {
        Self::new_from_nalgebra_with_buffers(mean, cov, empty_matrix(), empty_matrix())
    }

    pub fn new_from_nalgebra_with_buffers(
        mean: OVector,
        cov: OMatrix,
        precision: OMatrix,
        cov_chol_decomp: OMatrix,
    ) -> Result<Self, Error>
    {
        // Parameter checking etc. ...

        // Real construction logic, but using the provided buffers (precision & cov_chol_decomp)
        // that are then stored inside this struct
        Ok(Self {
            ...
        })
    }
}

Then into_params() would need to be modified. Or maybe a new into_params_and_buffers() function would be added or something like that

FreezyLemon commented 2 months ago

Maybe some builder pattern with some back-and-forth possibilities.. Tried implementing my idea from the last comment, and it's not very nice. Returning such a large tuple with repeated types makes the API confusing.

E.g. the signature of one of these methods would be

pub fn into_params_and_buffers(self) -> (OVector<f64, D>, OMatrix<f64, D, D>, OMatrix<f64, D, D>, OMatrix<f64, D, D>)
FreezyLemon commented 2 months ago

I'll think of a better solution for this..