nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.24k stars 417 forks source link

Question about k-prototypes ordinal variables #102

Closed jiyelee14 closed 5 years ago

jiyelee14 commented 5 years ago

How can I apply ordinal variable in kprototypes??

For example, I have categorical income level variable from 1 to 7. The higher value means the higher income. The difference between income level 1 and 2 differs from the difference between income level 1 and 7. So, I want to apply the order difference.

Help. Thank You :D

nicodv commented 5 years ago

It's up to you to reason appropriately about your data. k-prototypes is no magic bullet.

Personally, I would say the ordinal data is more like numerical data than categorical data, so I would use it that way. In your example, I would take the mean of the income levels (if you have that data) and use that as a numerical features. So transform 1, 2, 3, ... 7 to 20k, 30k, 50k, ... 1M, for example.

jiyelee14 commented 5 years ago

Thank you for your kind answer : )