Closed regorsmitz closed 5 years ago
More fiddling revealed that this issue occurs if you pass a Pandas dataframe instead of a NumPy array as input to fit_predict. The documentation says to pass a NumPy array so this is my mistake, but anyway I'd imagine other people might try passing a dataframe in since they are used to other prediction functions properly handling dataframes. As an improvement, it might be nice to have logic to check if a dataframe has been passed in and short circuit, rather than running and appearing to be broken internally.
This one feels similar to: https://github.com/nicodv/kmodes/issues/67
Have you tried this with the latest Github version?
This should be fixed now on master after merging https://github.com/nicodv/kmodes/pull/117, courtesy of @Genie-Liu .
I'm considering making a 0.10.1 patch release for this, as it seems a common problem.
I'm running the following code:
Since I have verbose mode on, I can see the moves per iteration, and I have noticed that with the configuration above, the training fails as soon as there is an iteration where there are 0 moves. To reproduce this issue, I recommend using a small dataset with a high number of clusters, so there is a high probability of an iteration with 0 moves.
Output / stack trace:
Initialization method and algorithm are deterministic. Setting n_init to 1. Init: initializing centroids Init: initializing clusters Init: initializing centroids Init: initializing clusters Starting iterations... Run: 1, iteration: 1/100, moves: 2640, ncost: 906024701442.8253 Run: 1, iteration: 2/100, moves: 935, ncost: 863644943798.5979 Run: 1, iteration: 3/100, moves: 557, ncost: 844366144404.3018 Run: 1, iteration: 4/100, moves: 398, ncost: 829619773050.4286 Run: 1, iteration: 5/100, moves: 325, ncost: 818463604224.1627 Run: 1, iteration: 6/100, moves: 256, ncost: 813235837011.3778 Run: 1, iteration: 7/100, moves: 165, ncost: 811553263961.7179 Run: 1, iteration: 8/100, moves: 130, ncost: 810452778360.2623 Run: 1, iteration: 9/100, moves: 126, ncost: 809493708178.163 Run: 1, iteration: 10/100, moves: 81, ncost: 808941359440.6614 Run: 1, iteration: 11/100, moves: 62, ncost: 808673546931.4755 Run: 1, iteration: 12/100, moves: 45, ncost: 808447845407.1216 Run: 1, iteration: 13/100, moves: 38, ncost: 808307752250.539 Run: 1, iteration: 14/100, moves: 24, ncost: 808243120277.072 Run: 1, iteration: 15/100, moves: 22, ncost: 808210883455.4402 Run: 1, iteration: 16/100, moves: 9, ncost: 808201300381.1038 Run: 1, iteration: 17/100, moves: 11, ncost: 808189508679.0436 Run: 1, iteration: 18/100, moves: 12, ncost: 808171886835.0874 Run: 1, iteration: 19/100, moves: 25, ncost: 808121825481.3004 Run: 1, iteration: 20/100, moves: 38, ncost: 808020403165.9956 Run: 1, iteration: 21/100, moves: 35, ncost: 807951740463.9619 Run: 1, iteration: 22/100, moves: 25, ncost: 807914200232.4612 Run: 1, iteration: 23/100, moves: 19, ncost: 807840929538.2213 Run: 1, iteration: 24/100, moves: 16, ncost: 807795774926.7335 Run: 1, iteration: 25/100, moves: 29, ncost: 807755854677.4387 Run: 1, iteration: 26/100, moves: 7, ncost: 807752426872.5327 Run: 1, iteration: 27/100, moves: 1, ncost: 807752358570.4669 Run: 1, iteration: 28/100, moves: 0, ncost: 807752358570.4669
TypeError Traceback (most recent call last)