tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.85k stars 336 forks source link

QUESTION - How does tslearn.svm.TimeSeriesSVC work? #336

Open andre-don opened 3 years ago

andre-don commented 3 years ago

hello experts,

I have a small question about classifying multivariate time series using SVM. In tslearn there is the possibility to classify time series or multivariate time series using tslearn.svm.TimeSeriesSVC. But in the documentation there is no detailed explanation how exactly this svm creates a hyperplane with the "gak" kernel and classifies multivariate time series. Is there any article to read how exactly SVM does this classification with multivariate time series or could someone explain it briefly ?

Thanks in advance

GillesVandewiele commented 3 years ago

Hi @andre-don,

Not really an expert on SVMs either, but AFAIK it maps the data instances to some type of similarity space using a "kernel" function. In tslearn, this is a kernel specifically designed for time series. So an NxK data matrix is transformed to an NxN containing similarities. Then, a linear decision boundary is learned that maximizes the distances between the "support vectors". This is often done by some linear programming technique (with a quadratic objective).

Hope this helps. Do not hesitate to question further if need be.

rtavenar commented 3 years ago

If the question is about the kernel itself, I guess both papers by Cuturi are of interest (see for example the dedicated page there: https://marcocuturi.net/GA.html)

And if the question is more about the specific multivariate case, then the only thing we do when features are p-dimensional is use Euclidean distance in R^p when a norm of the form $|x - y|$ is needed in the computations.