statrs-dev / statrs

Statistical computation library for Rust
https://docs.rs/statrs/latest/statrs/
MIT License
546 stars 78 forks source link

Consider implementing `Continuous<&f64, f64>` in addition to `Continuous<f64, f64>` #197

Closed rihardsk closed 2 weeks ago

rihardsk commented 7 months ago

The issue

I ran into this when I needed a generic function that both samples a distribution and calculates the PDF for returned values. Initially I needed this to work with statrs::distribution::Dirichlet, so I came up with something like this:

pub fn sample_and_calculate_pdf<D, R, T>(
    dist: &D,
    random: R,
) -> Vec<f64>
where
    D: Distribution<T> + for<'a> Continuous<&'a T, f64>,
    R: Rng,
{
  // do stuff
}

These trait bounds work fine for Dirichlet because it implements

impl Distribution<Matrix<f64, Dynamic, Const<1>, VecStorage<f64, Dynamic, Const<1>>>> for Dirichlet
impl<'a> Continuous<&'a Matrix<f64, Dynamic, Const<1>, VecStorage<f64, Dynamic, Const<1>>>, f64> for Dirichlet

that is, you can sample a Matrix<...> from Dirichlet and to calculate the PDF, you need to pass a &Matrix<...>.

My trait bounds break down, however, if I want to use the sample_and_calculate_pdf function on statrs::distribution::Normal, because it implements only

impl Distribution<f64> for Normal
impl Continuous<f64, f64> for Normal

that is, you can sample a f64 but, unlike previously, you pass the value itself when calculating the PDF, not its reference.

Proposed solution

A simple blanket impl would solve my issues:

impl<D> Continuous<&f64, f64> for D
where
    D: Continuous<f64, f64>,
{
    fn pdf(&self, x: &f64) -> f64 {
        self.pdf(*x)
    }

    fn ln_pdf(&self, x: &f64) -> f64 {
        self.ln_pdf(*x)
    }
}

Alternatives?

It seems that my only alternatives are to come up with some additional trait bounds that would essentially either allow me to produce a T from a &T, which would imply cloning, which is wasteful, or to produce a &T from a T which seems impossible because it would amount to creating a dangling reference.

YeungOnion commented 3 months ago

I expect that most impls of Continuous wouldn't consume arguments in those functions. Maybe it makes more sense to have implementations that explicitly reflect that and so that the caller can choose to use it either way

And if I recall correctly, types that impl Copy will move with memcpy, which is cheap, but perhaps not semantic for most pdf's; perhaps we could have the base implementation be a borrow and for those that impl Continuous<&K,T> we can impl Continuous<K,T> calling with the borrow so that existing code still works, seems very unergonomic to remove even if still 0.x version.


I think it may be a good alternative if feasible, however, I need someone more familiar with the borrow checker to figure that out I'm having trouble with my sample implementation for that.

I'm thinking the K impl is checked for before the explicit &K impl and that makes it recurse. Note: this implementation compiles with both expected calling syntaxes for explicit types, instead of generic type K in the move type, this instead will work.

impl<D,T> Continuous<f64, T> for D
where
    D: for <'a> Continuous<&'a f64, T>, // having recursion issue here
{
    fn pdf(&self, x: f64) -> T {
        self.pdf(&x)
    }
}

@henryjac would you expect that this is good for the 0.17 release?

YeungOnion commented 2 months ago

@rihardsk noticing that this would not work similarly for ContinuousCDF since we wouldn't want to return references from inverse_cdf, have you found a way around this?

YeungOnion commented 2 weeks ago

Gave this some more thought, until we have generic types, it may be better to rely on primitives as Copy than the potential indirection of reference. But it does identify a weak point in the API's for traits returning generics when associate types may be more appropriate.