scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
185 stars 42 forks source link

Expose Distance Information in KMedoids #168

Open skfaysal opened 6 months ago

skfaysal commented 6 months ago

Description

This pull request enhances the KMedoids implementation by exposing the distances of data points to their respective medoids. Previously, this information was internally computed during the clustering process but not exposed to users. The addition of the distances_ attribute allows users to access these distances without the need for additional pairwise distance calculations, which can be computationally expensive.

Changes Made

Addition of distances_ Attribute:

A new attribute, distances_, has been introduced to store the distances of each data point to its assigned medoid. Modification of fit Method:

The distances are now computed using the existing transform method and stored in the distances attribute. The self.inertia attribute is updated to use the distances directly, avoiding redundant pairwise distance calculations.

Motivation

The motivation behind this enhancement is to provide users with direct access to the distances between data points and their respective medoids. This information can be valuable for users who wish to perform additional statistical analyses, such as identifying the closest data points to medoids, without incurring the cost of recomputing pairwise distances.

Example Usage

Users can now access the distances using the distances_ attribute after fitting the model:

kmedoids_model = KMedoids(n_clusters=3)
kmedoids_model.fit(data)
distances_to_medoids = kmedoids_model.distances_

This information can be utilized for various purposes, enhancing the flexibility and utility of the KMedoids implementation.

TimotheeMathieu commented 6 months ago

Thanks, this looks good. The tests are failing for now but this should be fixed with PR #167, when PR#167 is merged we can merge here and check that everything is ok.