tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.91k stars 339 forks source link

Different DTW similarity score #346

Closed Okroshiashvili closed 3 years ago

Okroshiashvili commented 3 years ago

tslearn.metrics.dtw_path_from_metric() and tslearn.tslearn.metrics.dtw() with the default parameters gives me different similarity scores for the same time series. Should it be the case?

To reproduce my results:

import numpy as np import tslearn.metrics as tsm

x = np.linspace(0, 50, 100)

ts1 = pd.Series(3 * np.sin(x / .5))

ts2 = pd.Series(2 * np.sin(x))

Results:

tsm.dtw(ts1, ts2) -> 16.578554103357583

tsm.dtw_path_from_metric(ts1, ts2, metric="euclidean") -> 148.2423124144105

Is it a bug or intended behavior? If not a bug then how these two results are related or how can I compare them?

rtavenar commented 3 years ago

Hi @Okroshiashvili

In fact, DTW score is the square root of the squared euclidean distances along the path. So you should have:

tsm.dtw(ts1, ts2) == np.sqrt(tsm.dtw_path_from_metric(ts1, ts2, metric="sqeuclidean"))

Let me know if it is not the case.

Best, Romain

Okroshiashvili commented 3 years ago

Hi @rtavenar

Indeed, that is the case. Thanks a lot for the clarification :)

And, one more question please. I was trying to compare tslearn and dtw-python and getting different results for DTW

Setup is the same as my above example for tslearn. For dtw-python basic model is the following:

    dtw_result = dtw.dtw(x=ts1,
                         y=ts2,
                         dist_method="euclidean",
                         step_pattern="symmetric2")

This results DTW score around 174.2896

Why do we have that difference? Due to step_pattern parameter? which is set to "symmetric2"? I'm confused and I appreciate any help

rtavenar commented 3 years ago

I don't know about dtw-python, so I suggest you ask them :)

Okroshiashvili commented 3 years ago

I don't know about dtw-python, so I suggest you ask them :)

Thanks a lot :)

CelesteN87 commented 3 years ago

dtw-python uses a different step pattern by default than tslearn.

The tslearn step pattern/cost matrix is defined as C[i, j] = dist + min(C[i-1, j], C[i, j-1], C[i-1, j-1])

where as dtw python uses: C[i,j]=min(C[i-1, j]+dist,
C[i, j-1]+dist, C[i-1, j-1]+2*dist)

I believe there are also some other subtle differences regarding how the total distance is computed.