Best path issue using subsequence in C

henryzyx commented 1 year ago

I have two sequences with difference length for matching, I would expect the s1 should match the last three of s2. seq_t s1[] = { 2.1, 4.1, 5.1 }; seq_t s2[] = { 1.1, 2.1, 3.1, 4.1, 5.1 };

From the result, the distance 1.0 makes sense, but the best path doesn't make sense. This is the output using the benchmark12_subsequence based on my test. [(2,4)(1,3)(1,2)(0,1)(0,0)(0,0)(0,0)(0,0)] My expectation would be [(2,4)(1,3)(0,2)], would you pls help me figure this out or correct me if any misunderstanding?

snapshot

wannesm commented 1 year ago

Thanks for the detailed case and good observation. Both paths are optimal paths as you indicated (both have distance 1.0) but one would indeed expect that the path you mention is chosen because the diagonal is preferred. However, when stepping through the algorithm I noticed that floating point inaccuracies change this:

In theory, perfectly equal: Path 1: (0,2), (1,3), (2,4) = sqrt(1**2 + 0 + 0) = 1 Path 2: (0,1), (1,2), (1,3), (2,4) = sqrt(0 + 1**2 + 0 + 0) = 1 And path 1 should be chosen because the diagonal move has priority.

In practice, floating point inaccuracies: Path 1: (2.1-3.1) = 1.0 Path 2: (4.1-3.1) = 0.9999999999999996 (!= 1.0) The step in path 2 is thus slightly smaller and therefore preferred by the best path algorithm.

Not sure how or if this can be resolved easily.

If this is a blocking issue, a solution might be to use fixed points and represent numbers using int instead of float. The code is made such that this is easy. You only need to change the typedef in dd_globals.h (and translate to and from int, eg for 2 decimals myfp = (int)(myfloat*100)): https://github.com/wannesm/dtaidistance/blob/1ece7ac85d737c2617a5154d6cb0c57b35ded434/dtaidistance/lib/DTAIDistanceC/DTAIDistanceC/dd_globals.h#L19

wannesm commented 1 year ago

I added a version of best_path with isclose functionality to the C source in the master branch. You can now do:

dtw_best_path_isclose(wps, i1, i2, l1, l2, /*rtol=*/1e-05, /*atol=*/1e-08, &settings);
// returns [(2,4)(1,3)(0,2)(0,0)(0,0)(0,0)(0,0)(0,0)]

wannesm / dtaidistance

Best path issue using subsequence in C #194