statrs-dev / statrs

Statistical computation library for Rust
https://docs.rs/statrs/latest/statrs/
MIT License
544 stars 78 forks source link

Make `MultivariateNormal` generic over dimension #209

Open riverbl opened 3 months ago

riverbl commented 3 months ago

Allow MultivariateNormal to have either a runtime known dynamic dimension (as per current) or a compile time known constant dimension.

MultivariateNormal::new has been changed to take the mean and covariance as OVector and OMatrix rather than Vec.

Having the dimension be compile time known allows, for example, pattern matching on the output of sampling the distribution:

let mut rng = StdRng::from_entropy();
let dist = MultivariateNormal::new(vector![1.0, 2.0], matrix![1.0, 0.5; 0.5, 2.0]).unwrap();

let [[x, y]] = dist.sample(&mut rng).data.0;

println!("[{x}, {y}]");

See also #168, #177.

henryjac commented 3 months ago

It is a reasonable thing that would preferably have been done before I suppose, since this would be a breaking change. However since the multivariate cases only include this and Dirichlet now it could be worth to do this now instead of later.

Since all the usage of multivariate distributions use nalgebra vectors in the end except for creating it using new it would be much nicer to have a unified interface like this. The change is also relatively minor in that the user would only need to change things like vec![1., 2.] to dvector![1., 2.] or vector![1., 2.] with the corresponding use nalgebra::dvector / nalgebra::vector.

We should of course strive to prevent breaking changes but not be completely averse to it. If this change is accepted then Dirichlet should also be updated to match this.

@YeungOnion whats your input on this?

YeungOnion commented 3 months ago

I really like the ergonomics of pattern matching and I also think there's a great advantage for use cases where you know the sizes of everything you're using it'll ensure one can't make mistakes, and I could see a great reason to have the static version for performance reasons.

I think it's worth the breaking change, but would it not be feasible to impl rand::distributions::Distribution<OVector<_>> for Multivariate... and identify the dynamic as deprecated/or keep it to have both syntaxes?

riverbl commented 3 months ago

I think it's worth the breaking change, but would it not be feasible to impl rand::distributions::Distribution<OVector<_>> for Multivariate... and identify the dynamic as deprecated/or keep it to have both syntaxes?

DVector is a subtype of OVector - specifically, DVector<f64> is the same type as OVector<f64, Dynamic>. With this PR, MultivariateNormal<D> implements Distribution<OVector<f64, D>>, therefore MultivariateNormal<Dynamic> implements Distribution<DVector<f64>>.

As a result, it wouldn't be possible to add an implementation of Distribution<DVector<f64>> for MultivariateNormal<D> because this would conflict with the implementation of Distribution<OVector<f64, D>> in the case that D = Dynamic.

It would be possible to add an implementation of Distribution<DVector<f64>> for MultivariateNormal<Const<N>>, but I'm not sure this would help with backward compatibility, as no existing user will have a MultivariateNormal<Const<N>> - they will all have the equivalent of a MultivariateNormal<Dynamic>.

I might have misunderstood what you were asking - please let me know if my response makes sense!

YeungOnion commented 3 months ago

@riverbl that was what I was asking about and your explanation makes sense to me! Thanks for clarifying.


I like the benefits of this, my one caveat is that I think this would establish our API for other multivariate distributions, as I'm also thinking about #208.

For multiple variables from the same distribution, their sample space will be the same type, so vectors will work. For multiple variables that are not from the same distribution, their sample space may not be the same type, which would require tuples. And I wish to avoid being close, but not consistent.

let [[x, y]] = dist1.sample(&mut rng); // homogeneous types - not valid syntax with this pr, but to demonstrate
let (a, b) = dist2.sample(&mut rng); // heterogeneous types

Now that I've written it all down, I like it more as is, but I'll leave this here for posterity or if this sparks other thoughts.

My last question is potentially just because I'm shy in this new role. Would it be frowned on if our first release reviving the crate has a breaking change?

YeungOnion commented 2 months ago

@FreezyLemon mentioned that we should be able to drop the macros feature, and we can keep it as a dev dependency.

Can I suggest disabling the macros feature in nalgebra?

nalgebra = { version = "0.29", default-features = false, features = ["std", "rand"] }

# or:

[dependencies.nalgebra]
version = "0.29"
default-features = false
features = ["std", "rand"]

It looks like it is unnecessary for statrs and reduces the amount of dependencies & compile time a bit

I figured since this is related to use of nalgebra, it makes sense to tack it onto this one as a small change.

Patch

From 0da762cced81d2ab5fcaf9128bea7c073e97958a Mon Sep 17 00:00:00 2001
From: Orion Yeung <11580988+orionyeung001@users.noreply.github.com>
Date: Mon, 15 Apr 2024 09:04:41 -0500
Subject: [PATCH] feat: suggestion to drop nalgebra macros from lib
 dependencies

---
 Cargo.toml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Cargo.toml b/Cargo.toml
index c287477..a6efc61 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -21,13 +21,14 @@ nightly = []

 [dependencies]
 rand = "0.8"
-nalgebra = { version = "0.29", features = ["rand"] }
+nalgebra = { version = "0.29", default_features = false, features = ["std", "rand"] }
 approx = "0.5.0"
 num-traits = "0.2.14"
 lazy_static = "1.4.0"

 [dev-dependencies]
 criterion = "0.3.3"
+nalgebra = { version = "0.29", features = ["rand"] }

 [[bench]]
 name = "order_statistics"
-- 
2.34.1
riverbl commented 2 months ago

I've rebased and moved the nalgebra macros feature to dev dependencies.

The default dependency resolver for edition 2018 includes features that are specified in dev dependencies when building if the same crate is specified in dependencies without the feature - see https://doc.rust-lang.org/cargo/reference/resolver.html#resolver-versions.

If an nalgebra macro were accidentally used in statrs, this would lead to it building when build on its own, but failing to build when built as a dependency of an edition 2021 crate using the new dependency resolver.

I've bumped the crate edition to 2021 to avoid this.

YeungOnion commented 2 months ago

Thanks for looking at the behavior with the 2021 dependency resolver.


Quoting myself,

My last question is potentially just because I'm shy in this new role. Would it be frowned on if our first release reviving the crate has a breaking change?

I got over it; I think it's worth changing. @henryjac could you merge this one?

I'll add issues to Dirichlet as well as the public dependency on nalgebra (discussed briefly here #199)

FreezyLemon commented 1 month ago

I've bumped the crate edition to 2021 to avoid this.

For completeness' sake, I'll note two things about this:

  1. This technically requires doing a migration (cargo fix --edition) but no changes are needed for the current statrs code.
  2. This will effectively break compatibility with any compiler pre-1.56.0 (the 2021 Edition was stabilized with 1.56.0). statrs currently does not have an MSRV (minimum supported Rust version) policy, so this is not a "problem" because there were never any guarantees. But updating to 2021 edition might be an opportunity to add the MSRV concept to the project. Definitely out of scope for this PR though, just something to maybe think about