wannesm / dtaidistance

Time series distances: Dynamic Time Warping (fast DTW implementation in C)
Other
1.08k stars 184 forks source link

how to get warping_path for all sequence in a matrix? #218

Closed NathanWang7 closed 4 weeks ago

NathanWang7 commented 1 month ago

how to get warping_path just like dtaidistance.dtw.distance_matrix instead of pairs by pairs?

wannesm commented 1 month ago

Can you elaborate? Warping_path returns the optimal path that leads to the distance (between two series) and distance_matrix only returns the distance (between multiple series). The latter would be distance for just a pair of series.

NathanWang7 commented 1 month ago

Can you elaborate? Warping_path returns the optimal path that leads to the distance (between two series) and distance_matrix only returns the distance (between multiple series). The latter would be distance for just a pair of series.

For example, if I want to get all distance among multiple series, I just use 'dtw.distance_matrix'. It seems that there is no method to get all optimal paths among multiple series at once. I need to use a for loop to realize it, which is inefficient.

wannesm commented 1 month ago

That functionality is indeed not provided. This is because one typically wants to do some operation on the paths (eg a reduce operation for DBA) instead of storing all of them. There is thus no obvious C implementation that would be useful. Instead one can use the Python standard library itertools.product, multiprocessing.pool and map to compute all paths in parallel.

NathanWang7 commented 1 month ago

That functionality is indeed not provided. This is because one typically wants to do some operation on the paths (eg a reduce operation for DBA) instead of storing all of them. There is thus no obvious C implementation that would be useful. Instead one can use the Python standard library itertools.product, multiprocessing.pool and map to compute all paths in parallel.

Thank you for your explanation. As I understand it, if I can obtain the DTW distances between all time series through dtw.distance_matrix, this process should also require calculating the optimal warping paths of these time series, right? It's just that the current code doesn't have the functionality to save these paths?

wannesm commented 1 month ago

Correct, this is not stored as this would be unnecessary expensive. Also because to compute the DTW distance, you don't need to compute or keep track of the exact path. The methods distance and distance_matrix make use of this assumption.

You will thus need to implement this yourself. The naive way would be:

def compute_path(s, dtw_settings, idx):
    ri, ci = idx
    if ci <= ri:
        # DTW is symmetric, only compute one direction
        return None
    return dtw.warping_path(s[ci], s[ri], **dtw_settings)

s = np.array(
        [[0., 0, 1, 2, 1, 0, 1, 0, 0],
         [0., 1, 2, 0, 0, 0, 0, 0, 0],
         [1., 2, 0, 0, 0, 0, 0, 1, 1],
         [0., 0, 1, 2, 1, 0, 1, 0, 0],
         [0., 1, 2, 0, 0, 0, 0, 0, 0],
         [1., 2, 0, 0, 0, 0, 0, 1, 1]])

# compute and store all paths
with mp.Pool() as pool:
    dtw_settings = {}
    paths = pool.map(functools.partial(compute_path, s, dtw_settings),
                     itertools.product(range(len(s)), repeat=2))

The paths datastructure can grow quite large. So I don't recommend this. A map-reduce type of operation seems more in place here. But that's outside of the scope of this library is this depends on the exact function you are trying to apply.

You can also implement a reduce operation in a simple manner if it is without parallelization (naive example, more realistic would be something like DBA):

# compute paths one by one and immediately reduce to some target
avg_len, avg_len_cnt = 0, 0
for ri, ci in itertools.product(range(len(s)), repeat=2):
    if ci <= ri:
        continue
    path = dtw.warping_path(s[ci], s[ri], **dtw_settings)
    avg_len += len(path)
    avg_len_cnt += 1
avg_len /= avg_len_cnt

Note that, for this example it is quite easy to parallelize because it is associative and commutative, but this is not true in general.