Open StefanR44 opened 2 years ago
That is most definitely possible. Please check our user guide: https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html
The things you are looking for are called constraints. We support two types: itakura
and sakoe_chiba
That is most definitely possible. Please check our user guide: https://tslearn.readthedocs.io/en/stable/user_guide/dtw.html
The things you are looking for are called constraints. We support two types:
itakura
andsakoe_chiba
Ah, thanks a bunch @GillesVandewiele! I did not realize this was still k-shapes related. I will read into it!
Most welcome. I am not sure about how to use it for K-shape myself at the moment. It might require some patching. In case you manage to make it work, please comment here on how you did it!
Will do! I will do some fiddling with K-shapes in some ways. The biggest issue with K-shape for me is, that it requires a dataset of equal-sized time series. For my use-case I am clustering time series of energy consumption and sometimes I do not have them for the whole year but only for a few months. Do you know by chance, if there is a way to do the clustering based on actual dates or months (e.g. summer series only matched with other summer series)?
Not immediately from the top of my head... You could cut the longer timeseries into smaller windows as a preprocessing step or pad the shorter ones with noise perhaps? These are rather rudimentary and probably not the ideal solution though.
Yes, those are possible workarounds if everything else fails, I am actually on it right now 👍
Did not really succeed on that, as I realized that K-Shape does not use DTW but SBD. Now the questions is, how can that be adjusted? I was checking the functions for SBD usage, I guess this is where it is used / created. I tried to backtrack y_shifted_sbd_vec
. It is a function from tslearn: from tslearn.metrics import cdist_normalized_cc, y_shifted_sbd_vec
Is there a way to find out more about it? When i wanted to go to the definition I just got this:
from .cycc import cdist_normalized_cc, y_shifted_sbd_vec
What does it exactly do?
def _shape_extraction(self, X, k):
sz = X.shape[1]
Xp = y_shifted_sbd_vec(self.cluster_centers_[k], X[self.labels_ == k],
norm_ref=-1,
norms_dataset=self.norms_[self.labels_ == k])
S = numpy.dot(Xp[:, :, 0].T, Xp[:, :, 0])
Q = numpy.eye(sz) - numpy.ones((sz, sz)) / sz
M = numpy.dot(Q.T, numpy.dot(S, Q))
_, vec = numpy.linalg.eigh(M)
mu_k = vec[:, -1].reshape((sz, 1))
# The way the optimization problem is (ill-)formulated, both mu_k and
# -mu_k are candidates for barycenters
# In the following, we check which one is best candidate
dist_plus_mu = numpy.sum(numpy.linalg.norm(Xp - mu_k, axis=(1, 2)))
dist_minus_mu = numpy.sum(numpy.linalg.norm(Xp + mu_k, axis=(1, 2)))
if dist_minus_mu < dist_plus_mu:
mu_k *= -1
return mu_k
I would like to have DTW only move within a certain range, so that a point will not be matched to another point that is more than x steps away.
I hope this visualization helps: https://i.imgur.com/hlw9iOF.png
Thanks in advance
Picture edited from Paparrizos and Gravano, "k-Shape: Efficient and Accurate Clustering of Time Series" (2016)
!