seatonullberg / kernel-density-estimation

Kernel density estimation in Rust.
https://crates.io/crates/kernel-density-estimation
MIT License
27 stars 6 forks source link

Add support for multivariate KDE #2

Open seatonullberg opened 2 years ago

seatonullberg commented 2 years ago

Currently only univariate distributions are supported. A complete implementation would include seamless support for multivariate distributions. The only type that should be changed is the KernelDensityEstimator struct. Currently, the data structure is as follows:

pub struct KernelDensityEstimator<B, K> {
    observations: Vec<Float>,
    bandwidth: B,
    kernel: K,
}

The observations field will need to be converted to a nalgebra::DMatrix to support a multivariate distribution. The type should be hidden behind an alias so that end-users do not need to add nalgebra as a dependency in their own projects.

pub type Matrix2D = nalgebra::DMatrix<Float>;

pub struct KernelDensityEstimator<B, K> {
    observations: Matrix2D,
    bandwidth: B,
    kernel: K,
}

To prevent needless conversions for users working with univariate data, the data structure could instead add a generic parameter T representing the type of observations. For univariate data T could be concretely represented as Vec<Float> and for multivariate data T could be concretely represented as Matrix2D. However, this would require the introduction of two new traits UnivariateKDE and MultivariateKDE to mimic overloading of the method names pdf, cdf, and sample.

pub struct KernelDensityEstimator<T, B, K> {
    observations: T,
    bandwidth: B,
    kernel: K,
}

pub trait UnivariateKDE {
    // Unimplemented.
}

pub trait MultivariateKDE {
    // Unimplemented.
}

// Univariate case.
impl<B, K> UnivariateKDE for KernelDensityEstimator<Vec<Float>, B, K>
where
    B: Bandwidth,
    K: Kernel,
{
    // Unimplemented.
}

// Multivariate case.
impl<B, K> MultivariateKDE for KernelDensityEstimator<Matrix2D, B, K>
where
    B: Bandwidth,
    K: Kernel,
{
    // Unimplemented.
}

Lastly, the traits UnivariateKDE and MultivariateKDE should be sealed to prevent end-user implementations.

humphreylee commented 1 year ago

Good piece of work. Would you mind if I ask if there is any progress on this? Thanks.

seatonullberg commented 1 year ago

Hi @humphreylee, I'm currently preoccupied with writing my dissertation, but I do intend to get back to this as soon as I am able.

emyr666 commented 7 months ago

bump! am using seaborn in python to do 2d kde contour plots but its very slow. am hoping that by using rust I can use the webasm stuff to do this fast on the client side instead of generating plots (exteremly slowly) on the server to send to a web client.

rob-p commented 4 months ago

Also a bump on this. We have an application, related to gene expression analysis, where we need a 2D density estimator. Currently, there are no crates for this in rust and we are calling out to a Python library which is both slow and much uglier than we'd like (the rest of the code is pure rust). It would be great to have the ability to do the density estimation in rust and this seems like the only crate I can find where this is even on the roadmap.