nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

how to control iterations #154

Closed dfolch closed 3 years ago

dfolch commented 3 years ago

I was trying to run KPrototypes for exactly one iteration by setting max_iter=1. However, this seems to result in the algorithm running two iterations.

K = 20
N = int(1e5)
M = 10
MN = 5
data = np.random.randint(1, 1000, (N, M))

KPrototypes(n_clusters=K, init='Huang', n_init=1, max_iter=1, verbose=2, random_state=9999)\
    .fit_predict(data, categorical=list(range(M - MN, M)))

Output:

Init: initializing centroids
Init: initializing clusters
Starting iterations...
Run: 1, iteration: 1/1, moves: 35350, ncost: 14365614457.095963
Run: 1, iteration: 2/1, moves: 18059, ncost: 13751981936.187408

When I set max_iter=0 I appear to get the one iteration I'm looking for.

KPrototypes(n_clusters=K, init='Huang', n_init=1, max_iter=0, verbose=2, random_state=9999)\
    .fit_predict(data, categorical=list(range(M - MN, M)))

Output:

Init: initializing centroids
Init: initializing clusters
Starting iterations...
Run: 1, iteration: 1/0, moves: 35350, ncost: 14365614457.095963

I think this is a bug, but I'm not certain. Is there a better way to get exactly one iteration?

nicodv commented 3 years ago

Yes, that looks like a small bug, @dfolch . Just use max_iter=0 for now.

nicodv commented 3 years ago

Fixed with https://github.com/nicodv/kmodes/pull/160