slaypni / fastdtw

A Python implementation of FastDTW
MIT License
774 stars 122 forks source link

Added subsequence alignment in fastdtw.py #27

Open Sophie-Ha opened 5 years ago

Sophie-Ha commented 5 years ago

Modified fastdtw.py to allow subsequence alignment by creating a new version of __dtw with less restrictions

slaypni commented 5 years ago

@Sophie-Ha Thank you for sending PR. I would like to ask followings:

Sophie-Ha commented 5 years ago

@slaypni I added the tests. The algorithm works the same as the original FastDTW without the constraint that the path has to start in the corners of the grid.

DTW in original FastDTW: The path starts in the green cell and ends in the blue cell original_dtw

Subsequence version: The path can start anywhere in the green row and end anywhere in the blue row as long as the other path constraints are fulfilled (e.g. not going backwards) subsequence_dtw

I achieved this by setting the intermediate cost to zero for the complete 0th row instead of just the point (0,0). Then the costs for all cells in the current window are computed as normally. Instead of taking the path that ends in the top right corner, the algorithm takes the path with minimal cost across the complete top row.

slaypni commented 5 years ago

@Sophie-Ha Thank you for your explanation! I have tried to use the function, and got following outputs.

As far as I understand it properly, shouldn't the result distance be 0?

In [37]: x = [1, 2, 0, 1, 2]

In [38]: fastdtw_subsequence(x, [1, 3 ,1] + x + [3, 2, 1])
Out[38]: (2.0, [(0, 3), (1, 3), (2, 3), (3, 3), (4, 4)])
Sophie-Ha commented 5 years ago

@slaypni Yes, the distance should be 0. I looked into it and the suboptimal result is due to the fastdtw approximation. If you set the radius to 2 or higher the result is 0.

slaypni commented 5 years ago

@Sophie-Ha Does that mean it cannot actually start anywhere in the green row and end in the blue row unless radius is big enough? If so, is the behavior of this function desirable particulally when y is much longer than x?

Sophie-Ha commented 5 years ago

@slaypni It can start anywhere, regardless of the radius. If the radius is too small the algorithm just might miss the optimal solution, but this is also the case for regular fastdtw. Since the endpoints are not fixed in the subsequence version the resulting variability from the optimal path can be higher. If this is desirable or not comes down to a tradeoff between speed and accuracy which completely depends on the use case.