Open rtavenar opened 7 years ago
In addition to the constraint and the LB_Keogh prune, I'd suggest adding in cascading pruning methods described by the UCR suite. It looks like it could be extended to KNN.
@rtavenar (@GillesVandewiele: Would you mind share your thought on this?)
Do you mind if I work on this? I saw issue #407, and I had an idea. Then, I searched for LB_Keogh
and found this issue as well!
So, I think something like this can work:
from tslearn.metrics import lb_keogh
from tslearn.metrics import dtw
from tslearn.metrics import cdist_dtw
# calculate LowerBound (LB)
LB = np.full((n, n), 0, dtype=np.float64)
for i in range(n - 1):
for j in range(i + 1, n):
d = lb_keogh(T[i], T[j], radius=r)
LB[i, j] = d
LB[j, i] = d # redundant!
kmax = 20 # maximum number of neighbors
# to keep track of nearest neighbors
D = np.full((n, kmax), np.inf, dtype=np.float64)
I = np.full((n, kmax), -1, dtype=np.int64)
for i in range(n-1):
for j in range(i+1, n):
if LB[i,j] < D[i,-1]:
d = dtw(T[i], T[j], global_constraint="sakoe_chiba", sakoe_chiba_radius=r)
if d < D[i, -1]:
pos = np.searchsorted(D[i], d, side='right')
# update D and I
D[i, pos+1:] = D[i, pos:-1]
D[i, pos] = d
I[i, pos+1:] = I[i, pos:-1]
I[i, pos] = j
# fill distance matrix (it is exact at the indices stored in `I`)
Dist_matrix = np.full((n, n), np.inf, dtype=np.float64)
for i in range(n):
for j, nn_i in enumerate(I[i]):
Dist_matrix[i, nn_i] = D[i, j]
np.fill_diagonal(Dist_matrix, 0)
We can use Dist_matrix
for KNN (with k <= kmax
).
This is not bad. It shows some improvement. I think the paper mentioned by @unnamedplay-r might be useful as well.
@NimaSarajpoor that would be great!
Probably, something that could be useful if you start working on this would be to have some kind of benchmark to highlight the improvement in running times when using pruning, as well as tests that would ensure that the results are similar to those obtained when DTW with Sakoe-Chiba band is sused without pruning.
@rtavenar I got your point! Sure. I will keep you posted if I get to a good stage.
We may take advantage of Upper Bound (UB, which is Euclidean distance) as follows:
i
: LB[j]
is lower bound of dtw distance between i
and j
j
if LB[j] is more than UB of at least kmax subsequences.I think step2, for all observations, has O(N^2 log N)
time complexity. Maybe we should first see if it is worth it (and this may depend on the kmax
)
It would be nice to make kNN searches faster when Sakoe-Chiba constrained DTW is concerned using LB_Keogh based pre-filtering.
This should be implemented in the
kneighbors
method of classKNeighborsTimeSeriesMixin
from moduleneighbors
.