nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.24k stars 417 forks source link

Draft implementation of `sample_weight` for kmodes. #174

Closed kklein closed 2 years ago

coveralls commented 2 years ago

Coverage Status

Coverage increased (+0.02%) to 97.925% when pulling 03a92c8b6adb5e2e13ced53664a8d57646475f3e on kklein:kmodes_sample_weight into 4de7bf7f26359c91e9b21cf6fa7c18c8e2833b0d on nicodv:master.

kklein commented 2 years ago

Hi @nicodv ! Unfortunately I'm not able to recreate the test failure locally with python 3.10.4. Locally, both values in question amount to 242.05714285714285 and 242.05714285714288 respectively.

Do you have a hunch on what could be going on here?

nicodv commented 2 years ago

@kklein , I think it would be wise to use a static random seed to all your tests (random_state=42), just like here: https://github.com/nicodv/kmodes/blob/master/kmodes/tests/test_kprototypes.py#L68

I'm hoping it will eliminate any test failures due to random variations in the algorithms.

nicodv commented 2 years ago

Please merge if you think it's ready, @kklein .

FYI, I'll be making a 0.12.0 release of kmodes after these additions.

kklein commented 2 years ago

Thanks for the fast review! :) I think I don't have permission to merge this but from my side it's good to go!