Closed NimaSarajpoor closed 1 year ago
Hello,
Thank you this looks interesting.
I think, given the subject, you can try to understand the outline of the code of KMedoids and do the same, the idea is to stay in line with scikit-learn code.
To help you here is the skeleton I would use if I were to make a code for your notebook in a file sklearn_extra/cluster/agglomerative_clustering_minmax.py
for instance:
class Agglemorative_clustering_minmax(BaseEstimator, ClusterMixin, TransformerMixin):
"""
# Add short description here
Parameters
----------
X: array-like, shape (n_samples, n_features) or (n_samples, n_samples)
affinity: "precomputed" or "euclidean" (default)
Metric used to compute the distance between any two samples
n_clusters: int, default=2
The number of clusters to find.
Attributes
----------
# Add attribute (for clusterMixin you must at least have labels_
and n_iter_ . I don't think there are any centers or inertia in your definition
of clustering so at first just labels_ and n_iter_ is alright.
Examples
--------
# Add a short example of use
References
----------
# Add the article
"""
def __init__(
self,
n_clusters=8,
metric="euclidean",
max_iter=300,
):
self.n_clusters = n_clusters
self.metric = metric # instead of affinity I would use metric to be homogeneous with KMedoids
self.max_iter = max_iter # this is a bound on the number of iterations. In order not to have infinite loop if there is some bug
def fit(self, X, y=None):
"""Fit Agglemorative clustering to the provided data.
Parameters
----------
X : {array-like, sparse matrix}, shape = (n_samples, n_features), \
or (n_samples, n_samples) if metric == 'precomputed'
Dataset to cluster.
y : Ignored
Returns
-------
self
"""
# Put your code here to compute labels_
# labels_ is what you called "clusters", i.e. the output of your algorithm
# and n_iter_ the number of iterations used
return self
def transform(self, X):
"""Transforms X to cluster-distance space.
Parameters
----------
X : {array-like, sparse matrix}, shape (n_query, n_features), \
or (n_query, n_indexed) if metric == 'precomputed'
Data to transform.
Returns
-------
X_new : {array-like, sparse matrix}, shape=(n_query, n_clusters)
"""
# I am not certain that it is alright to do it like this but here is what
# I would do : output the result of the r function for each point.
# i.e. the min_{x \in C} d_{max}(x, C)
return r(X)
def predict(self, X):
"""Predict the closest cluster for each sample in X.
Parameters
----------
X : {array-like, sparse matrix}, shape (n_query, n_features), \
or (n_query, n_indexed) if metric == 'precomputed'
New data to predict.
Returns
-------
labels : array, shape = (n_query,)
Index of the cluster each sample belongs to.
"""
# Add the cluster computation here, can be used on new points not only
# on the training data.
return clusters
It would be also good to make some additional functions in the class like maybe a function that does the update of the distance matrix.
Once the code is done, you would also have to
sklearn-extra/cluster/__init__.py
doc/modules/cluster.rst
)doc/api.rst
)sklearn-extra/cluster/tests
Don't hesitate to ask if you have any question. I wrote part of KMedoids code so I understand it pretty well.
EDIT : most of the code is a suggestion not an official guideline, feel free to improve the basic ideas I gave.
@TimotheeMathieu Thanks a lot for your input. You are right. I might take a look at Agglemorative Clustering in scikit-learn as well and see how the clusters are stored at each cut.
I start with your guideline. Thanks!!
Create Tutorial Notebook for Min-Max linkage and provide its naive implementation, corresponding to #93
Hello,
This is my first time trying to contribute to scikit-learn-extra. It was hard for me to understand the source code. Therefore, as the first step, I created a notebook and provided a naive implementation. I believe this can be helped us with the unit test.
I was wondering if someone can guide me through reading the source code and help me implement this for the scikit-learn-extra package.