nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

Different clusters when K-Prototypes trained on same data in numpy array and pandas dataframe #183

Closed RoddyJaques closed 1 year ago

RoddyJaques commented 2 years ago

I'm getting very different results when using fit_predict() on a KPrototypes wiht the same dataset as a pandas dataframe and numpy array.

The resulting clusters are very different, I've kept random_state constant, the only difference is the format of the input data. Have checked dtypes and all are consistent.

nicodv commented 2 years ago

I added tests for this scenario, but I can't reproduce this: https://github.com/nicodv/kmodes/commit/f5532e0064207aab4edcb53be509153aa2cf00ac

Please provide a fully reproducible example, @RoddyJaques .