nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

issues with kprototype predict function parameters when used in sklearn pipeline #60

Open soufianee opened 6 years ago

soufianee commented 6 years ago

hi,

i appreciate your work in k-prototypes. i have a datasets containing numerical and categorical variables. and i wan't to used with others mehtods inside a SKlearn pipline. the fitting is performed right by addiding the categorical parameter like this "Mymodel__categorical". but when i wan't to use the predict function the pipeline doesn't allow to add any parameters other than the inputs. Accordingly, i think that the k-prototypes class do not persist the categorical parameter in learning process and miss a predict function with one parameter of inputs to work well with sklearn pipeline.

thank you, scincerly.

nicodv commented 6 years ago

The API of the kmodes.predict method indeed needs a categorical argument, but sklearn does not allow for extra arguments to the predict method. This causes kmodes to be incompatible with some higher-level functionality of sklearn, such as Pipelines.

So, this is a know issue, and can not be resolved without changes to either API.

The only solution that seems somewhat acceptable to me is to move the categorical argument to the __init__ of KModes/KPrototypes, but I don't like it conceptually.

Suggestions are welcome.

soufianee commented 6 years ago

thank you Mrs. nicodv.

yes by moving the categorical argument to the init and saved when calling it in fit function is the solution.

eventually, i tinkled my program to work with the current configuration. but i advice you to think about it. to be compatible with the pipeline methods because its widely used data science community and in spark programing.

scincerly.

aiborra11 commented 2 years ago

Hi, Just wondering if there is any solution for this? Trying to create a sklearn pipeline for my KPrototypes model but can't see how to pass the categorical index list as an argument when fitting/predicting the model...