tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.89k stars 336 forks source link

Cluster Centers are not updating after assigning init #484

Open Barathwaja opened 11 months ago

Barathwaja commented 11 months ago

Describe the bug Hi I'm trying to set the Clustercenters through init argument and after FIT it's recomputed and setting for that dataset. How to know if it really uses that base and setting up or not.

To Reproduce Code

init_data = np.array([[[1040.9555],
                      [1037.463],
                      [1034.8087],
                      [1031.3035]]])

model = TimeSeriesKMeans(n_clusters=1,
                             verbose=False,
                             metric='euclidean', 
                             random_state=2, init=init_data)

print(model.__dict__)

print("After FIT")
model.fit(X)

print(model.__dict__)

Results

{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
        [1037.463 ],
        [1034.8087],
        [1031.3035]]])}
After FIT
/Users/beast/opt/anaconda3/lib/python3.9/site-packages/tslearn/utils/utils.py:90: UserWarning: 2-Dimensional data passed. Assuming these are 8 1-dimensional timeseries
  warnings.warn(
{'n_clusters': 1, 'max_iter': 50, 'tol': 1e-06, 'n_init': 1, 'metric': 'euclidean', 'max_iter_barycenter': 100, 'metric_params': None, 'n_jobs': None, 'dtw_inertia': False, 'verbose': False, 'random_state': 2, 'init': array([[[1040.9555],
        [1037.463 ],
        [1034.8087],
        [1031.3035]]]), 'labels_': array([0, 0, 0, 0, 0, 0, 0, 0]), 'inertia_': 37706.06962299204, 'cluster_centers_': array([[[1033.4625   ],
        [1007.5545   ],
        [ 966.0016875],
        [ 926.6316875]]]),
YannCabanes commented 7 months ago

Hello @Barathwaja, When you initialize the class TimeSeriesKMeans with an init input parameter equal to an ndarray, this parameter is stored and is accessible via init attribute (in your case model.init). When you use the fit method on a dataset, the init parameter is left unchanged. The k-means algorithm is initialized using the init ndarray. Then after running the k-means algorithm, the final positions of the clusters centers are stored in the cluster_centers_ attribute. In your case, you can access the cluster centers via model.cluster_centers_. If you want to predict the label of a new point, the attribute cluster_centers_ will be used. If you want to fit your model on a new dataset, the attribute init will be used.

I am not sure to understand what you are willing to do. If you want to update your init parameter using your final cluster centers positions, you can use: model.init = model.cluster_centers_ If you want to control the value of the cluster centers, you can use: model.cluster_centers_ = cluster_centers where cluster_centers is an ndarray of shape (n_clusters, sz, d).

I hope this helps!