Restrict Motif discovery to subsequences starting at specific locations

matrix-profile-foundation / matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

https://matrixprofile.org

Apache License 2.0

360 stars 62 forks source link

Restrict Motif discovery to subsequences starting at specific locations #49

Open peterdhansen opened 4 years ago

peterdhansen commented 4 years ago

When I am analyzing data that has daily fluctuations, I create a annotation vector that is 1 at midnight of each day and 0 everywhere else. This helps prioritize subsequences that start at midnight so each set of motifs have the same 24 hour structure.

The issue is that applying an annotation vector does not prevent the motif algorithm from picking a motif pair where one starts at midnight and the other does not. A new mechanism would have to be defined to restrict these.

Also distance profiles that are calculated inside the motif algorithm do not apply the annotation vector. This could be added and triggered when use_cmp = True without any new mechanisms.

I can write a custom motif finding code that does this, but if others would like the functionality I'd be happy to contribute.

vanbenschoten commented 4 years ago

@peterdhansen I think that'd be a great contribution! We're looking to grow out more of our utility functions that go beyond the core algorithms. Feel free to make a PR and we can collaborate.

tylerwmarrs commented 4 years ago

I think it would be good to have this functionality. As you mention, it seems fairly trivial to implement. The "harder" thing to do is to write a blog post explaining when the approach is useful. Are you interested in adding the code, unit tests, and a blog post? @peterdhansen

peterdhansen commented 4 years ago

Sounds good. I'll give it a shot. I should be able to contribute a blog post too. 😄

peterdhansen commented 4 years ago

So, I just had another idea. Would it be possible/useful to restrict the MP calculation to only consider certain indices? I'm not sure what you would return for the MP then for the other indices.

peterdhansen commented 4 years ago

Or is the snippets algorithm doing this (for regularly spaced index selection)

(Sorry for triple post)

vanbenschoten commented 4 years ago

I think that'd be useful - in my original Hacker News for matrixprofile-ts post I proposed doing something similar, and folks seemed really interested.

On Thu, Sep 17, 2020, 11:32 AM Peter Hansen notifications@github.com wrote:

Or is the snippets algorithm doing this (for regularly spaced index selection)

(Sorry for triple post)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matrix-profile-foundation/matrixprofile/issues/49#issuecomment-694352672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB53ISCLS7B3OXENGY2O5OTSGI22RANCNFSM4RPKYNFQ .

tylerwmarrs commented 4 years ago

So, I just had another idea. Would it be possible/useful to restrict the MP calculation to only consider certain indices? I'm not sure what you would return for the MP then for the other indices.

We could use a similar approach to how missing data can be handled. The stomp implementation handles this right now and we are working on adding similar functionality to mpx. Essentially, provide a boolean array of indices to process or skip. I envision it working like annotation vector. All other distances in the profile can simply return nan.

Another approach could be to require users to have valid time domains using a Pandas time series or something. This way we can have users specify intervals of interest.

tylerwmarrs commented 4 years ago

Or is the snippets algorithm doing this (for regularly spaced index selection)

(Sorry for triple post)

Snippets does not do this. It identifies k representative snippets and n neighbors. It helps to answer what is common in the series of interest.

tylerwmarrs commented 3 years ago

@peterdhansen Any updates on this?

peterdhansen commented 3 years ago

Sorry, not yet. I'll take a look in the next week or so.

peterdhansen commented 3 years ago

Got my environment setup 😄

vanbenschoten commented 3 years ago

@peterdhansen just wanted to circle back and see if you're still interested in contributing.

Happy Holidays!