tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

Feature Importance/Influence in Multivariate Time Series Clustering #412

Open ajanadj opened 2 years ago

ajanadj commented 2 years ago

Is there a way to determine the importance of each features in multivariate time series for the decision of the clustering? For example, feature x has the most influence in cluster y.

My time series is modeled as (n_ts, ts_length, n_dim) with n_dim as the number of features.

NimaSarajpoor commented 2 years ago

@ajanadj Hi...I am just interested in what you mentioned. Do you know how this can be done for simple tabular data with n samples and p features? I am trying to simplify your problem. Could you please provide a reference? (If you do not have any ref or you cannot fine one...please continue)


btw, this is what I think: Let's say we have 10 samples with only 2 features (X and Y). And, let's say we have two clusters. One cluster has centroid (1, 0) and the members are very close to this center. The other one has centroid (1, 10) and its members are very close to this centroid. Can you imagine that? (feel free to plot it on a 2D XY-plane). Now, can you see what feature is more important? I think Y is the one that forms the clusters! So, one simple idea is to perform clustering on each dim and see which one gives you the highest silhouette score.

(I haven't searched about it..maybe it is wrong...this was just an idea!)