wannesm / dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Other
1.09k stars 184 forks source link

DTW: A way to set the maximum number of query points that can be mapped to a reference point #149

Closed vancromy closed 2 years ago

vancromy commented 2 years ago

Hello,

Thanks for such an amazing tool and a lightning fast one at that! I've been playing with the warpingpaths and warp function to try to align two sequences (a query and a reference sequence) and have been able to constrain the solver with windowing but I see there are other ways of constraining it further (e.g. psi & penalty & max kwargs). I'm unfortunately not able to make much sense of these and therefore was hoping to get some guidance. Ultimately I'm looking for a way to prevent too many points from my query sequence from being mapped ('warped') to 1 reference point. Is this possible within the current available kwargs of the DTW function api.

Here's an example of the issue I'm having (this plot was created by already constraining the window size): image

wannesm commented 2 years ago

Interesting question! The window is the main tool to avoid this from within the algorithm. Penalty can help to reduce the number of compression/expansions. The penalty should be similar to the typical differences between the series (this is how you can balance both preferences, a very large penalty will lead to Euclidean distance). We've been thinking about this ourselves, but this would probably require some form of memory and change the algorithm's properties significantly (e.g. there are some DTW variants that keep track of multiple values).

For the example you show, this is also because they are both in different ranges. If you are mainly interested in warping similar shapes, I'd recommend preprocessing with z-normalisation or differencing. See the examples on the following pages: https://dtaidistance.readthedocs.io/en/latest/usage/dtw.html#dtw-based-on-shape-z-normalization https://dtaidistance.readthedocs.io/en/latest/usage/clustering.html#k-means-dba-clustering