nicodv / kmodes

Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
MIT License
1.23k stars 416 forks source link

Euclidean distance definiton lacks a square root #180

Closed patryk-kowalski95 closed 2 years ago

patryk-kowalski95 commented 2 years ago

The current function definition is as follows:

    """Euclidean distance dissimilarity function"""
    if np.isnan(a).any() or np.isnan(b).any():
        raise ValueError("Missing values detected in numerical columns.")
    return np.sum((a - b) ** 2, axis=1)

whereas the Euclidean distance requires a square root. It should be:

    """Euclidean distance dissimilarity function"""
    if np.isnan(a).any() or np.isnan(b).any():
        raise ValueError("Missing values detected in numerical columns.")
    return np.sqrt(np.sum((a - b) ** 2, axis=1))
nicodv commented 2 years ago

This is a deliberate choice, see:

patryk-kowalski95 commented 2 years ago

Thank you for clarifying