tom-whitehead / hdbscan

15 stars 4 forks source link

Allow different distance metrics parametrizations for different coordinate systems #15

Closed codehippo closed 1 month ago

codehippo commented 2 months ago

Hello! Let me start by expressing gratitude for this great crate!

However, I'd love to use this crate for a custom color palette generation via image color quantization. I'd love to make this using the HDBSCAN algorithm in the OKLrCH space but the space uses cylindrical coordinates.

I could probably convert these back to Cartesian coordinates before going to clustering. This would work but I'd love to play around different non-euclidean metrics so as to discriminate against dark tones etc. which is easiest when having the possibility to set the distance metric yourself by (for example) supplying a closure.

Maybe adding a fourth option to DistanceMetric enum for Custom and then letting the user supply the implementation themselves?

tom-whitehead commented 2 months ago

Hey @codehippo, thank you for raising an issue! To be honest with you I hadn't even heard of cylindrical coordinates, everything I cluster is firmly in cartesian spaces. All that is required to make this work is a distance function. I like the idea you suggest of having a means of providing a custom distance function as this makes the crate more flexible. I'm quite busy with work until the latter half of the week, but I will try and take a look at this before the weekend :)

tom-whitehead commented 2 months ago

Hey @codehippo, I was just having a look at this. I think I've tied myself in a bit of a knot with the DistanceMetric enum. It seems quite complicated to also implement a Custom variant where the user can pass a function. As such, I'm just going to add a Cylindrical variant for you to use, which I am planning on implementing as below (reference):

pub(crate) fn cylindrical_distance<T: Float>(a: &[T], b: &[T]) -> T {
    let (r1, theta1, z1) = (a[0], a[1], a[2]);
    let (r2, theta2, z2) = (b[0], b[1], b[2]);

    let theta_diff = theta2 - theta1;
    let radial_component = r1 * r1 + r2 * r2 - T::from(2.0).unwrap() * r1 * r2 * theta_diff.cos();
    let vertical_component = (z2 - z1) * (z2 - z1);

    (radial_component + vertical_component).sqrt()
}

Let me know what you think. The one annoying part is that the three components are always in the order r, theta, z, however this appears to be the convention.

When/if I do a major version bump, I will change how I implement the distance metrics and export a trait that users can implement.

tom-whitehead commented 1 month ago

Hey @codehippo, I've added a cylindrical distance option for you to use. Hopefully it's quite straight forward but you can refer to this test to see how to use it. Note that the coordinates need to be in the order radial distance, angular coordinate, height, so HSV colours would need to be reordered to SHV. Hope you find this useful.