quaquel / EMAworkbench

workbench for performing exploratory modeling and analysis
BSD 3-Clause "New" or "Revised" License
127 stars 90 forks source link

Print or plot perfomance metrics for `clusterer` result #262

Open mikhailsirenko opened 1 year ago

mikhailsirenko commented 1 year ago

Maybe, it could be a good addition to allow for printing/plotting permanence metrics, e.g. silhouette_score or any other that is more applicable in the case of AgglomerativeClustering. Or you prefer to keep those apart?

quaquel commented 1 year ago

Can you give a slightly more elaborate example showing both the current and the desired behavior?

mikhailsirenko commented 1 year ago

E.g.

def plot_score(data:pd.DataFrame, metric:str, linkage:str, max_clusters:int, score:str='silhouette'):
    """Plot clustering perfomance score for different number of clusters.

    Args:
        data (pd.DataFrame): Data to cluster.
        metric (str): Metric to use for clustering.
        linkage (str): Linkage method to use for clustering.
        max_clusters (int): Maximum number of clusters to try.
        score (str, optional): Score to use. Defaults to 'silhouette'.

    Raises:
        ValueError: If the score is unknown.

    Returns:
        None
    """
    if score == 'silhouette':
        score_function = silhouette_score
    elif score == 'calinski_harabasz':
        score_function = calinski_harabasz_score
    elif score == 'davies_bouldin':
        score_function = davies_bouldin_score
    else:
        raise ValueError(f'Unknown score: {score}')

    scores = {}
    for i in range(2, max_clusters + 1):
        labels = clusterer.apply_agglomerative_clustering(data, i, metric=metric, linkage=linkage)
        scores[i] = score_function(data, labels, metric=metric)
    fig, ax = plt.subplots(figsize=(6, 4))
    ax.plot(list(scores.keys()), list(scores.values()))
    ax.set_xlabel('Number of clusters')
    ax.set_ylabel(f'{score.capitalize()} score')
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)