gsel9 / dgufs

A Python implementation of the Dependence Guided Unsupervised Feature Selection (DGUFS) algorithm developed by Jun Guo and Wenwu Zhu.
MIT License
5 stars 0 forks source link

IF I don't know the number of cluster #1

Closed shichenhu closed 1 week ago

shichenhu commented 7 months ago

whether it's possible to use parament like SSE to determine the number of cluster,cause my data has no labels

gsel9 commented 7 months ago

Hi!

You can try different number of clusters, measure the SSE for each run and select the number of clusters giving the smallest error. Maybe it could look something like this:

import numpy as np
from sklearn.datasets import load_iris

iris = load_iris(return_X_y=False)
X, y = iris.data, iris.target

clusters = [3, 6, 9]
results = {}

for num_clusters in clusters:

    dgufs = DGUFS(num_clusters=num_clusters)
    dgufs.fit(X)

    cluster_centers = [X[dgufs.memberships == i].mean(axis=0) 
                    for i in range(num_clusters)]

    clusterwise_sse = np.ones(num_clusters) * np.nan 
    for datapoint, label in zip(X, dgufs.memberships):
        clusterwise_sse[label] += np.square(datapoint - cluster_centers[label]).sum()

    results[f"num_clusters_{num_clusters}"] = clusterwise_sse
shichenhu commented 7 months ago

thank you so much, it's very useful

gsel9 commented 1 week ago

Case closed