pollen-robotics / dtw

DTW (Dynamic Time Warping) python module
GNU General Public License v3.0
1.16k stars 233 forks source link

Differences in the distance for the same data (maybe just question) #34

Closed flekschas closed 5 years ago

flekschas commented 5 years ago

Thanks a lot for providing the library!

I have a question regarding the comparability of the different versions. The results for the same data and the same distance metric seems to differ between dtw and accelerated_dtw as well as the way the distance metric is specified.

In the following example, I would expect that all three results are equal given that the data and distance metric are the same but that's clearly not the case.

Is this expected? If so, do you know why?

import numpy as np
from dtw import dtw, accelerated_dtw

x = np.array([1.0, 0.9, 1.2, 2.3, 3.8, 3.3, 4.2, 1.9, 0.5, 0.3, 0.3])
y = np.array([0.5, 1.0, 0.9, 1.2, 2.3, 3.8, 3.3, 4.2, 1.9, 0.5, 0.3])

l2_norm = lambda x, y: (x - y) ** 2

d1, _, _, _ = accelerated_dtw(x, y, 'euclidean')
d2, _, _, _ = accelerated_dtw(x, y, dist=l2_norm)
d3, _, _, _ = dtw(x, y, dist=l2_norm)

print(d1, d2, d3)

# -> 0.031818181818181815 0.013181818181818183 0.011363636363636364
pierre-rouanet commented 5 years ago

Hi @flekschas

I think your confusion comes from the fact that here the distance defined is used to compute a matrix of pairwise distances and not directly from x to y. In dim=1 the euclidian distance actually corresponds to the absolute value.

flekschas commented 5 years ago

Yes, the L2 norm of scalars simply describes their difference. I've fixed the issue with the l2-norm but the two versions of dtw still return different values and I am not sure why that should be the case.

import math
import numpy as np
from dtw import dtw, accelerated_dtw

x = np.array([1.0, 0.9, 1.2, 2.3, 3.8, 3.3, 4.2, 1.9, 0.5, 0.3, 0.3])
y = np.array([0.5, 1.0, 0.9, 1.2, 2.3, 3.8, 3.3, 4.2, 1.9, 0.5, 0.3])

l2_norm = lambda x, y: math.sqrt((x - y) ** 2)

d1, _, _, _ = accelerated_dtw(x, y, 'euclidean')
d2, _, _, _ = accelerated_dtw(x, y, dist=l2_norm)
d3, _, _, _ = dtw(x, y, dist=l2_norm)

print(d1, d2, d3)
# -> 0.031818181818181815 0.031818181818181815 0.022727272727272728```

d2 and d3 differ by about 0.01. Somehow the two algorithms must differ in more than just the speed or am I having another issue in my code? :)

PS: I just copy-pasted the l2-norm from the README and haven't double checked it. It might not make a difference for dtw but I'd probably still adjust it :)

pierre-rouanet commented 5 years ago

Thanks for reporting the problem @flekschas.

Indeed there was a difference when computing the accumulated cost matrix. It should be fixed by #35. I've updated the version to 1.3.3 with the fix.