Open peterdhansen opened 4 years ago
@peterdhansen I think that'd be a great contribution! We're looking to grow out more of our utility functions that go beyond the core algorithms. Feel free to make a PR and we can collaborate.
I think it would be good to have this functionality. As you mention, it seems fairly trivial to implement. The "harder" thing to do is to write a blog post explaining when the approach is useful. Are you interested in adding the code, unit tests, and a blog post? @peterdhansen
Sounds good. I'll give it a shot. I should be able to contribute a blog post too. 😄
So, I just had another idea. Would it be possible/useful to restrict the MP calculation to only consider certain indices? I'm not sure what you would return for the MP then for the other indices.
Or is the snippets algorithm doing this (for regularly spaced index selection)
(Sorry for triple post)
I think that'd be useful - in my original Hacker News for matrixprofile-ts post I proposed doing something similar, and folks seemed really interested.
On Thu, Sep 17, 2020, 11:32 AM Peter Hansen notifications@github.com wrote:
Or is the snippets algorithm doing this (for regularly spaced index selection)
(Sorry for triple post)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matrix-profile-foundation/matrixprofile/issues/49#issuecomment-694352672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB53ISCLS7B3OXENGY2O5OTSGI22RANCNFSM4RPKYNFQ .
So, I just had another idea. Would it be possible/useful to restrict the MP calculation to only consider certain indices? I'm not sure what you would return for the MP then for the other indices.
We could use a similar approach to how missing data can be handled. The stomp implementation handles this right now and we are working on adding similar functionality to mpx. Essentially, provide a boolean array of indices to process or skip. I envision it working like annotation vector. All other distances in the profile can simply return nan.
Another approach could be to require users to have valid time domains using a Pandas time series or something. This way we can have users specify intervals of interest.
Or is the snippets algorithm doing this (for regularly spaced index selection)
(Sorry for triple post)
Snippets does not do this. It identifies k representative snippets and n neighbors. It helps to answer what is common in the series of interest.
@peterdhansen Any updates on this?
Sorry, not yet. I'll take a look in the next week or so.
Got my environment setup 😄
@peterdhansen just wanted to circle back and see if you're still interested in contributing.
Happy Holidays!
When I am analyzing data that has daily fluctuations, I create a annotation vector that is 1 at midnight of each day and 0 everywhere else. This helps prioritize subsequences that start at midnight so each set of motifs have the same 24 hour structure.
The issue is that applying an annotation vector does not prevent the motif algorithm from picking a motif pair where one starts at midnight and the other does not. A new mechanism would have to be defined to restrict these.
Also distance profiles that are calculated inside the motif algorithm do not apply the annotation vector. This could be added and triggered when
use_cmp = True
without any new mechanisms.I can write a custom motif finding code that does this, but if others would like the functionality I'd be happy to contribute.