tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.91k stars 342 forks source link

How to scale cluster centers back in the original scale #512

Closed praveenpiisc closed 7 months ago

praveenpiisc commented 8 months ago

I was using k-shape algorithm from tslearn for energy meter clustering . I used the following code :-

scaler = TimeSeriesScalerMeanVariance() X_normalized = scaler.fit_transform(X) n_clusters =3 ks = KShape(n_clusters=n_clusters, verbose=True, random_state=0) ks_cluster_assignments = ks.fit_predict(X_normalized) ks.clustercenters.shape ks_cluster_centers = ks.clustercenters

The clustering worked & i received cluster centres in the ks_cluster_centers. But the cluster centres are in the scaled format (mean =0 & sd of 1).

How to re-scale the cluster centres back to the original scale...

Any help will be of tremendous importance to me Sir...

YannCabanes commented 7 months ago

Hello @praveenpiisc, Let's recall the transformation performed when a time series is normalized using:

scaler = TimeSeriesScalerMeanVariance()
X_normalized = scaler.fit_transform(X)

This transformation is:

mean_t = np.nanmean(X_, axis=1, keepdims=True)
std_t = np.nanstd(X_, axis=1, keepdims=True)
std_t[std_t == 0.] = 1.
X_ = (X_ - mean_t) * self.std / std_t + self.mu

See: https://github.com/tslearn-team/tslearn/blob/9937946/tslearn/preprocessing/preprocessing.py#L204-L298 This transformation "Scales time series so that their mean (resp. standard deviation) in each dimension is mu (resp. std)."

In this transformation, the mean and standard deviation are computed along axis=1 which corresponds to the length of the time series. There are therefore a mean and a standard deviation computed for each time series and dimension.

Therefore, there is no global mean and standard deviation that could be used to rescale the cluster centers since there is no unique "original scale".