Closed Galy88 closed 2 years ago
This appears to occur due to the jaccard_dissim_binary
dissimilarity function, in combination with some specifics of the data set. Specifically, it appears this distance metric does not support situations where you have binary data, and all rows have the same value for a column.
You'll either need to:
Expected Behavior
Given a binary matrix, for example, of a 3x4 size, the KModes algorithm is expected to run and find to which cluster each row of the matrix should belong.
Actual Behavior
Currently we have a 3x4 binary matrix, when we run the KModes algorithm, we immediately get the following message: "Insufficient Number of data since union is 0". With 3x4 binary matrixes sometimes the algorithm works.
Steps to Reproduce the Problem
Import libraries:
Adjust kmodes
km = KModes(n_clusters=2, init='cao', random_state=0, n_jobs=-1, cat_dissim=jaccard_dissim_binary)
Fit predict
clusters = km.fit_predict(m)
with this matrix the algorithm works:
Specifications