statrs-dev / statrs

Statistical computation library for Rust
https://docs.rs/statrs/latest/statrs/
MIT License
543 stars 78 forks source link

multivariate API and dependency on nalgebra #225

Open YeungOnion opened 2 months ago

YeungOnion commented 2 months ago

We support multivariate distributions and nalgebra is part of our API. We'll be merging #209 and this will be a breaking change. The new methods clearly rely on nalgebra, but it might be more implicit with other methods. I think we can either make it clear that args and outputs for/from these distributions will need the dependency or find some way to opt out of using the same or any nalgebra.

Options I can think (in order of what I think is increasing complexity)

See #199 #209

FreezyLemon commented 2 months ago

non-default feature for multivariate, would also generally address not compiling a linear algebra library

FYI: Dirichlet currently depends on nalgebra::DVector and Multinomial has nalgebra types in its public API (e.g. Multinomial::mean). So those would need to be addressed too.

One more issue I'd like to raise is one of developer ergonomics. There are a few functions which take nalgebra types as parameters (e.g. new_from_nalgebra), but library users cannot easily use these without both

It might make sense to reexport some nalgebra types from statrs to make this less of a problem.

write API to support different numeric algebra backends, nalgebra is broadly used, but there are others

If it's just for the public API, this could be done with extension traits, hidden behind a feature flag or maybe in a different crate.

YeungOnion commented 2 months ago

It might make sense to reexport some nalgebra types from statrs to make this less of a problem.

This doesn't sound like a heavy lift, but I don't know how much we'd need to reexport. Would be good to look into. A question I have: would that mean the syntax in user code would look like this?

use statrs::distribution::MultivariateNormal;
use statrs::appropriate_module_name::{...}; // whichever may be needed
use nalgebra as na;

let mvn = MultivariateNormal::new_from_nalgebra(statrs::appropriate_module_name::vec!(_), statrs::appropriate_module_name::matrix!(_));

Do you have ideas for what you'd choose for traits we'd create to connect to a general linear algebra library type? If not, I think we start small and see what mileage we get with re-exporting. Hopefully most users (and us) will be using an updated nalgebra.

FreezyLemon commented 2 months ago

This doesn't sound like a heavy lift, but I don't know how much we'd need to reexport. Would be good to look into.

Well, it could be a minimal reexport (vector/matrix types plus maybe convenience macros) or just a re-export of the entire nalgebra library. This depends on the other API decisions too IMHO.

A question I have: would that mean the syntax in user code would look like this? ...

Generally, yes. I don't think there's a "right" way of doing these re-exports, but I could think of

use statrs::distribution::MultivariateNormal;
use statrs::nalgebra::{DMatrix, dvector}; // or whatever type it may be in the end

let mvn = MultivariateNormal::new_from_nalgebra(dvector!(...), DMatrix::from_column_slice(...));

or (if you're already importing types from another nalgebra)

use statrs::distribution::MultivariateNormal;
use statrs::nalgebra as statrs_na;

let mvn = MultivariateNormal::new_from_nalgebra(statrs_na::dvector!(...), statrs_na::DMatrix::from_column_slice(...));

Do you have ideas for what you'd choose for traits we'd create to connect to a general linear algebra library type?

The way I've seen this in the wild is often just a 1:1 extension trait per type. And in this case, I guess one trait per type per linalg library. So roughly:

// mod nalgebra_ext
pub trait MultivariateNormalNalgebraExt {
  pub fn new_from_nalgebra(...) -> MultivariateNormal;
  // ...
}

// ...
// mod ndarray_ext
pub trait MultivariateNormalNdarrayExt {
  pub fn new_from_ndarray(...) -> MultivariateNormal;
  // ...
}

// ...

and then their respective implementations. With some structuring and/or a prelude, this could allow users to

use statrs::ndarray_ext::*;
use statrs::distribution::MultivariateNormal;

MultivariateNormal::new_from_ndarray(...);

This would still mean that there's a dependency on nalgebra/etc. at some point, but I'm not sure I see a better way at the moment.