The package performs an evaluation of clustering results through the semantic relationship between the significant frequent patterns identified among the cluster items. The method uses an internal validation technique to evaluate the cluster rather than using distance-related metrics. However, the algorithm requires that the data be organized in CATEGORICAL FORM.
particularize_descriptors(descriptors, particular_threshold=1.0)
Particularization of descriptors based on support. This function particularizes descriptors using a threshold applied on the carrier (support maximum - support minimum) of the feature in the clusters.
particular_threshold
float: Particularization threshold. Given the relative support, 0.0 means that the entire range of relative support will be used, while 0.5 will be used half, and 1.0 only maximum support is kept.
descriptors
_array-like of shape (n_clusters, nfeatures): Matrix with the support of features in each cluster.
descriptors
_array-like of shape (n_clusters, nfeatures): Matrix with the computed particularized support of features in each cluster.
semantic_descriptors(X, labels, particular_threshold=None)
Semantic descriptors based on feature support. This function computes the support of the present feature (1-itemsets composed by the features with value 1) of the samples in each cluster. Features in a cluster that do not meet the particularization criterion have their support zeroed.
X
_array-like of shape (n_samples, nfeatures): Feature array of each sample. All features must be binary.
labels
_array-like of shape (nsamples,): Cluster labels for each sample starting in 0.
particular_threshold
{None, float}: Particularization threshold. None means no particularization strategy.
descriptors
_array-like of shape (n_clusters, nfeatures): Matrix with the computed particularized support of features in each cluster.
sledge_score_clusters(X, labels, particular_threshold=None, aggregation='harmonic')
SLEDge score for each cluster. This function computes the SLEDge score of each cluster. If aggregation is None, returns a matrix with values S, L, E, and D for each cluster.
X
_array-like of shape (n_samples, nfeatures): Feature array of each sample. All features must be binary.
labels
_array-like of shape (nsamples,): Cluster labels for each sample starting in 0.
particular_threshold
{None, float}: Particularization threshold. None means no particularization strategy.
aggregation
{'harmonic', 'geometric', 'median', None}: Strategy to aggregate values of S, L, E, and D.
scores
_array-like of shape (nclusters,): SLEDge score for each cluster.
score_matrix
_array-like of shape (nclusters, 4) if aggregation is None: S,L,E,D score for each cluster.
sledge_score(X, labels, particular_threshold=None, aggregation='harmonic')
The SLEDge score. This function computes the average SLEDge score of all clusters.
X
_array-like of shape (n_samples, nfeatures): Feature array of each sample. All features must be binary.
labels
_array-like of shape (nsamples,): Cluster labels for each sample starting in 0.
particular_threshold
{None, float}: Particularization threshold. None means no particularization strategy.
aggregation
{'harmonic', 'geometric', 'median'}: Strategy to aggregate values of S, L, E, and D for each cluster.
score
float: Average SLEDge score.