nodrogluap / OpenDBA

GPU-accelerated Dynamic Time Warp (DTW) Barycenter Averaging
Other
62 stars 13 forks source link

Different Result from tslearn #7

Closed shrimpceviche closed 3 years ago

shrimpceviche commented 4 years ago

I was comparing the result of pair distance from OpenDBA to tslearn side to side, and find a few datasets has really different results given the same input. For example, using the UCRArchive_2018 "Rock" data set, for the first 5 arrays, without any constraints (i.e. without itakura_max_slope or sakoe_chiba_radius) the tslearn returned the pairwise distance of 0 1 2 3 4 0 0.000000 49.989252 280.619679 704.749511 747.370318 1 49.989252 0.000000 234.680443 656.096446 710.960112 2 280.619679 234.680443 0.000000 396.202309 455.455888 3 704.749511 656.096446 396.202309 0.000000 114.267849 4 747.370318 710.960112 455.455888 114.267849 0.000000

but the OpenDBA returned the distance of 0 1 2 3 4 0 0.0 10.151 31.1862 48.1643 57.02570 1 NaN 0.000 33.6761 49.2297 58.92180 2 NaN NaN 0.0000 17.9237 17.97590 3 NaN NaN NaN 0.0000 7.05957 4 NaN NaN NaN NaN 0.00000

I also compared the order (to do clustering) in which the arrays are close to a specific array, and still found the two libraries return different results.

I am fairly new to GPU programming, so hope you can have some clue about this. I can share the code for the pairwise distance from tslearn, please let me know.

nodrogluap commented 4 years ago

Hi, thanks for checking this out. My first intuition with the much larger distance values provided by tslearn is that it is not Z-normalizing the input data (but OpenDBA does by default). I do not use tslearn, can you confirm that this is the case? If you z-normalize and are getting different results, I would then check that they are using the White-Neely step pattern for DTW by default. If this is the case, I'd be happy to investigate further with your data and your example code for both approaches. Thanks!

nodrogluap commented 3 years ago

Hi, there has been a fix applied to global DTW distance calculation, in case you would like to try again. If it's copacetic I will close this issue.

nodrogluap commented 3 years ago

OpenDBA now returns nearly identical values for the first 5 series of the UCR Rock training set with normalization turned off, so I will consider this issue closed:

$ more rock_train5.pair_dists.txt 1 0 49.9893 280.62 704.75 747.37 1 0 234.68 656.096 710.96 1 0 396.202 455.456 1 0 114.268 1 0