scikit-learn-contrib / scikit-learn-extra

scikit-learn contrib estimators
https://scikit-learn-extra.readthedocs.io
BSD 3-Clause "New" or "Revised" License
185 stars 42 forks source link

For sklearn_extra.cluster.KMedoids - allow the parameter 'init' to take a numpy array to be on par with sklearn.cluster.KMeans #139

Closed raymondj-pace closed 2 years ago

raymondj-pace commented 2 years ago

The sklearn.cluster.KMeans class has an 'init' parameter as does the sklearn_extra.cluster.KMedoids class.

sklearn.cluster.KMeans' init parameter allows a numpy array to be passed as the initial values for the centroids.

It would be useful if sklearn_extra.cluster.KMedoids could allow its 'init' parameter to also accept a numpy array as the initial centroids (medoids).

KMeans example:

c = np.array([[2, 2], [3, 4], [6, 2]])
X = np.array([[1, 2], [2, 1], [1, 3], [5, 4], [6, 3], [7, 2], [6, 1]])

kmeans = KMeans(n_clusters=3, init=c, random_state=0, verbose=0).fit(X)

for i in range(len(x)):
    print('x' + str(i+1) + ' = ' + str(kmeans.labels_[i]))
print('\n')

Output: x1 = 0 x2 = 0 x3 = 0 x4 = 1 x5 = 2 x6 = 2 x7 = 2

I would like to be able to do the same with KMedoids and specify the initial medoids:

c = np.array([[1, 2], [2, 1]])  # These initial medoids are in the set X below
X = np.array([[1, 2], [2, 1], [1, 3], [5, 4], [6, 3], [7, 2], [6, 1]])

kmedoids = KMedoids(n_clusters=2, init=c, random_state=0).fit(X)

for i in range(len(x)):
    print('x' + str(i+1) + ' = ' + str(kmedoids.labels_[i]))
print('\n')

Output: ValueError: init needs to be one of the following: ['random', 'heuristic', 'k-medoids++', 'build']

Desired output: x1 = 0 x2 = 0 x3 = 0 x4 = 1 x5 = 1 x6 = 1 x7 = 1

TimotheeMathieu commented 2 years ago

Please check that you use the last version of scikit-learn-extra, this feature was implemented in PR #137 .

raymondj-pace commented 2 years ago

Hmmm, Ok. I am using: scikit-learn-extra 0.2.0 py38ha53d530_1 conda-forge I'll see if pipy has a later version. Thanks.

raymondj-pace commented 2 years ago

I'm looking at: https://scikit-learn-extra.readthedocs.io/en/stable/generated/sklearn_extra.cluster.KMedoids.html

And I don't see it.

TimotheeMathieu commented 2 years ago

The PR is very new and hence you will not see this feature in the last stable release. Instead, it is in the "latest" doc : https://scikit-learn-extra.readthedocs.io/en/latest/generated/sklearn_extra.cluster.KMedoids.html and you can install the associated version of scikit-learn-extra with

pip install git+https://github.com/scikit-learn-contrib/scikit-learn-extra
raymondj-pace commented 2 years ago

Got it, thank you.