Open lynnwong11 opened 6 years ago
@markdregan
It has been a while since I looked at the code. I believe it only takes one feature at a time.
On Fri, Sep 22, 2017, 6:28 PM Lynn Wong notifications@github.com wrote:
@markdregan https://github.com/markdregan
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping/issues/7#issuecomment-331597864, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3KFtOiG2oiuU5OOWdTAXAbOHwNOsTbks5slF7PgaJpZM4PgLDd .
@markdregan thank you so much~ I searched the internet and find there is a python module : https://github.com/pierre-rouanet/dtw In it's code, it says: :param array x: N1M array :param array y: N2M array
It deals with n_features by using dist: dist, cost, acc, path = dtw(x, y, dist=lambda x, y: norm(x - y, ord=1))
Is it the normal way of handling multi-features or other way?
In my repo, you would need to update the function def _dist_matrix(self, x, y):
so that x and y are of shape num_features, num_samples, num_timesteps. The code in the code in the function would need to be updated to iterate through the features and output a distance matrix. The function should return dm
of the shape num_features, num_x_samples, num_y_samples.
The def predict(self, x):
function would need to be updated too. Depending on how you want to factor in the dtw distance per feature - implementation will be slightly different.
Or, you could use my code as if. Iterate through your dataset per feature - saving the distance matrix for each comparison of x and y. Then write your own method to do arg_sort across the feature distance matrices.
From Dr. Keogh and Dr. Mueen's talk on DTW last year: There are two main ways to use DTW to find a multi-dimensional distance.
The main difference is how tightly coupled you think the dimensions are (IE: how much is the position of one likely to reflect the position of the other). If you are sure the data is tightly coupled, method 2 is slightly better. However, as soon as random lags appear in your data you'll see the independent method (1) far outperform method 2.
He goes on to share some interesting, but not really relevant here, notes on how to keep dimensionality low (usually <5) and choose the best dimensions. The most important part there is that, when using DTW, using too many dimensions is less accurate than using even one random dimension alone.
In short,
While it might be useful to bake this in as a convinience feature, it depends quite a bit on use case.
I am new to Dynamic Time Warping and your note helps quite a lot. Thank you for your sharing. My input data is 3-D, having shape of (n_samples, n_timesteps,n_features). I am not sure how to transfer it into using the model. Thank you so much!