wannesm / dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Other
1.08k stars 184 forks source link

Question about kbest_matches (python) #214

Closed Johnnyeee closed 2 months ago

Johnnyeee commented 2 months ago

Hi,

I tried to figure out the algorithm of finding matches, so I run:

query = np.array([1,2,3])
series = np.array([1,1,1,2,3,1])

sa = subsequence_alignment(query, series, penalty = 0)

matching = sa.matching_function()
print("Matching Array:", matching)

# Find and print all matches
print("\nAll Matches:")
for match in sa.kbest_matches(k=None, minlength=None, maxlength=None, overlap=0):  # Use None for no limit on the number of matches
    print(f"Match Index: {match.idx}")
    print(f"Match Distance: {match.distance}")
    print(f"Match Segment: {match.segment}")
    print(f"Match Path: {match.path}")
    print(f"Match Value (Normalized Distance): {match.value}")

I got result like:

 Matching Array: [0.74535599 0.74535599 0.74535599 0.33333333 0.         0.66666667]

All Matches:
Match Index: 4
Match Distance: 0.0
Match Segment: [2, 4]
Match Path: [(0, 2), (1, 3), (2, 4)]
Match Value (Normalized Distance): 0.0
Match Index: 0
Match Distance: 2.23606797749979
Match Segment: [0, 0]
Match Path: [(0, 0), (1, 0), (2, 0)]
Match Value (Normalized Distance): 0.7453559924999299
Match Index: 1
Match Distance: 2.23606797749979
Match Segment: [0, 1]
Match Path: [(0, 0), (1, 0), (2, 1)]
Match Value (Normalized Distance): 0.7453559924999299

My question is why segment [5,5] didn't get recorded? Based on your codes, the matching array after finding the best match will be like: [0.74535599 0.74535599 0.74535599 inf inf 0.66666667]. So why the next best index is 0 instead of 5 since 0.666<0.745?

Sorry that my question might be too basic but it confuses me so much. Thanks for your help!

wannesm commented 2 months ago

It is expected behavior. The method returns the matches that have no overlap (if overlap=0) without recomputing the paths. Thus if the path that leads to that number in the matching (in this case [2,5]) overlaps with an earlier path (in this case [2,4]), it does not retain that path. If you would set overlap higher, then the match ending at position 5 is also returned.

To get the segment [5,5] returned, we would have to recompute the possible paths each time after finding a segment. That would be too expensive.

Johnnyeee commented 2 months ago

Thanks for your quick response.