Implementing constrained distance matrix

roya0045 commented 6 years ago

Greetings,

I don't want to reinvent the wheel but I was wondering something. Since computing the full distance matrix and all other derivative values for big time series I was wondering if an alternative could be implemented.

In cases where the window is small a simple constrained distance matrix could be generated the following way:

#Assuming the input are two pandas series
def constrained_distance(data_a,data_b,dist='cityblock',data_cols=('data','data'))
    suffixes=('_a','_b')
    output=np.zeros((len(data_a),len(data_a))
    og_a=deepcopy(data_a)
    for steps in range (-window,window+1):
        data_a=og_a#use the backup version for safety
        data_a.index=data_a.index.shift(steps) #shift index
        if dist=='cityblock':#shift values for cityblock
            data_a+steps
        datas=pd.merge(data_a,data_b, how='inner',left_index=True,right_index=True,suffixes=suffixes)
        distances= datas[data_cols[0]+suffixes[0]].subtract(datas[data_cols[1]+suffixes[1]])
        output+=np.diagflat(distances,k=steps)
   output[output==0]=np.nan#replace out of bounds values to ensure all is ok
return(output)

i'm not sure how this would affect the algorithm (I don't have the time to delve more into the depths for now)

This is just a quick draft but let me know what you think!

pierre-rouanet commented 6 years ago

Hi!

This seems like an interesting approach. Unfortunately I don't really have time anymore to follow and keep up to date with dtw literature so I don't know if this local method is used and how it affects the general algorithm. Sorry but I'm afraid I won't be much of a help here. I let this open and do not hesitate to share your results if you keep investigating this!

roya0045 commented 6 years ago

Greetings again,

Got around to implementing it properly and testing it today.

Here is the test code

Interestingly cdist gives the same things in both cases, something seems off with cdist since the metric parameter did not change the output on my end. But the output of my alg makes sense for this test.

From what I can remember seeing since DTW uses a limited window to obtain the optimal path, computing the whole matrix for big series is overkill (at least it was for my cases.)

If you have the time to run it (assuming you have scipy, numpy and pandas) tell me what you think if you or anyone reading this has the time.

roya0045 commented 6 years ago

Turns out cdist is far more optimized even when it comes to bigger dataset, so using cdist is still the best option and the slowness that I experienced using your implementation must have came from something else.

jlc-christie commented 6 years ago

I'm currently working on updating and expanding on this library in my fork and this is one thing I am concentrating on, more specifically the warping constraint as specified in http://www.cs.unm.edu/~mueen/DTW.pdf I'll make a PR when I'm finished implementing and testing it, if @pierre-rouanet is happy for me to do so.

pierre-rouanet commented 6 years ago

Yes definitely @jlc-christie! I'll gladly integrate your PR. Thanks a lot for contributing.

pierre-rouanet commented 5 years ago

Any news @jlc-christie?

jlc-christie commented 5 years ago

@pierre-rouanet Hi, I have a fork of this project that implements the constrained distance matrix and I've got examples of it working, but it breaks the ability to have asymmetrical distance matrices and thus I can't merge in to your branch.

I also no longer have time to fix these bugs, but if anyone has time to merge our two codebases that would be welcomed!

pierre-rouanet commented 5 years ago

Ok thanks for the quick answer @jlc-christie. I'll close the issue for the moment but feel free to re-open it if anyone has time to work on this.

pollen-robotics / dtw

Implementing constrained distance matrix #26