Closed NathanWang7 closed 4 weeks ago
Can you elaborate? Warping_path
returns the optimal path that leads to the distance (between two series) and distance_matrix
only returns the distance (between multiple series). The latter would be distance
for just a pair of series.
Can you elaborate?
Warping_path
returns the optimal path that leads to the distance (between two series) anddistance_matrix
only returns the distance (between multiple series). The latter would bedistance
for just a pair of series.
For example, if I want to get all distance among multiple series, I just use 'dtw.distance_matrix'. It seems that there is no method to get all optimal paths among multiple series at once. I need to use a for loop to realize it, which is inefficient.
That functionality is indeed not provided. This is because one typically wants to do some operation on the paths (eg a reduce operation for DBA) instead of storing all of them. There is thus no obvious C implementation that would be useful. Instead one can use the Python standard library itertools.product, multiprocessing.pool and map to compute all paths in parallel.
That functionality is indeed not provided. This is because one typically wants to do some operation on the paths (eg a reduce operation for DBA) instead of storing all of them. There is thus no obvious C implementation that would be useful. Instead one can use the Python standard library itertools.product, multiprocessing.pool and map to compute all paths in parallel.
Thank you for your explanation. As I understand it, if I can obtain the DTW distances between all time series through dtw.distance_matrix
, this process should also require calculating the optimal warping paths of these time series, right? It's just that the current code doesn't have the functionality to save these paths?
Correct, this is not stored as this would be unnecessary expensive. Also because to compute the DTW distance, you don't need to compute or keep track of the exact path. The methods distance
and distance_matrix
make use of this assumption.
You will thus need to implement this yourself. The naive way would be:
def compute_path(s, dtw_settings, idx):
ri, ci = idx
if ci <= ri:
# DTW is symmetric, only compute one direction
return None
return dtw.warping_path(s[ci], s[ri], **dtw_settings)
s = np.array(
[[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1],
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1]])
# compute and store all paths
with mp.Pool() as pool:
dtw_settings = {}
paths = pool.map(functools.partial(compute_path, s, dtw_settings),
itertools.product(range(len(s)), repeat=2))
The paths
datastructure can grow quite large. So I don't recommend this. A map-reduce type of operation seems more in place here. But that's outside of the scope of this library is this depends on the exact function you are trying to apply.
You can also implement a reduce operation in a simple manner if it is without parallelization (naive example, more realistic would be something like DBA):
# compute paths one by one and immediately reduce to some target
avg_len, avg_len_cnt = 0, 0
for ri, ci in itertools.product(range(len(s)), repeat=2):
if ci <= ri:
continue
path = dtw.warping_path(s[ci], s[ri], **dtw_settings)
avg_len += len(path)
avg_len_cnt += 1
avg_len /= avg_len_cnt
Note that, for this example it is quite easy to parallelize because it is associative and commutative, but this is not true in general.
how to get warping_path just like dtaidistance.dtw.distance_matrix instead of pairs by pairs?