markdregan / K-Nearest-Neighbors-with-Dynamic-Time-Warping

Python implementation of KNN and DTW classification algorithm
774 stars 215 forks source link

can training data is 3-D form? #7

Open lynnwong11 opened 6 years ago

lynnwong11 commented 6 years ago

I am new to Dynamic Time Warping and your note helps quite a lot. Thank you for your sharing. My input data is 3-D, having shape of (n_samples, n_timesteps,n_features). I am not sure how to transfer it into using the model. Thank you so much!

lynnwong11 commented 6 years ago

@markdregan

markdregan commented 6 years ago

It has been a while since I looked at the code. I believe it only takes one feature at a time.

On Fri, Sep 22, 2017, 6:28 PM Lynn Wong notifications@github.com wrote:

@markdregan https://github.com/markdregan

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/markdregan/K-Nearest-Neighbors-with-Dynamic-Time-Warping/issues/7#issuecomment-331597864, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3KFtOiG2oiuU5OOWdTAXAbOHwNOsTbks5slF7PgaJpZM4PgLDd .

lynnwong11 commented 6 years ago

@markdregan thank you so much~ I searched the internet and find there is a python module : https://github.com/pierre-rouanet/dtw In it's code, it says: :param array x: N1M array :param array y: N2M array

It deals with n_features by using dist: dist, cost, acc, path = dtw(x, y, dist=lambda x, y: norm(x - y, ord=1))

Is it the normal way of handling multi-features or other way?

markdregan commented 6 years ago

In my repo, you would need to update the function def _dist_matrix(self, x, y): so that x and y are of shape num_features, num_samples, num_timesteps. The code in the code in the function would need to be updated to iterate through the features and output a distance matrix. The function should return dm of the shape num_features, num_x_samples, num_y_samples.

The def predict(self, x): function would need to be updated too. Depending on how you want to factor in the dtw distance per feature - implementation will be slightly different.

Or, you could use my code as if. Iterate through your dataset per feature - saving the distance matrix for each comparison of x and y. Then write your own method to do arg_sort across the feature distance matrices.

Apathyman commented 6 years ago

From Dr. Keogh and Dr. Mueen's talk on DTW last year: There are two main ways to use DTW to find a multi-dimensional distance.

  1. Run DTW on each dimension and add up the distances.
  2. Find the distance of each time step (total multi-dimensional distance) and find a single warping path

The main difference is how tightly coupled you think the dimensions are (IE: how much is the position of one likely to reflect the position of the other). If you are sure the data is tightly coupled, method 2 is slightly better. However, as soon as random lags appear in your data you'll see the independent method (1) far outperform method 2.

He goes on to share some interesting, but not really relevant here, notes on how to keep dimensionality low (usually <5) and choose the best dimensions. The most important part there is that, when using DTW, using too many dimensions is less accurate than using even one random dimension alone.

In short,

  1. Consider picking a few features out of your set to reduce error.
  2. Unless you're really confident, assuming your total distance is the sum distance of each feature is fine.

While it might be useful to bake this in as a convinience feature, it depends quite a bit on use case.