matrix-profile-foundation / matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
https://matrixprofile.org
Apache License 2.0
360 stars 62 forks source link

Compatibility with string data types #74

Open AndrewWilkins84 opened 3 years ago

AndrewWilkins84 commented 3 years ago

The proposed changes allow mp.compute() to work with string data types. mp.analyze() still needs work however since function calls to mass2() are used when discovering motifs and/or discords, but a new function could easily bypass this behavior and produce results comparable to https://www.cs.ucr.edu/~eamonn/PAN_SKIMP%20%28Matrix%20Profile%20XX%29.pdf (page 3).

Additionally, documentation suggests minimum window size should be 4. I have changed several of the functions I came across to reduce the minimum window size from 8 down to 4, consistent with documentation.

lgtm-com[bot] commented 3 years ago

This pull request introduces 1 alert when merging 60dde27dce1ee0dbfcfc7b93cb0f31770bbea335 into 5ef23828b7594664e512575d20eea14a6272bf10 - view on LGTM.com

new alerts:

vanbenschoten commented 3 years ago

@AndrewWilkins84 thanks for the PR! I'll review shortly. In the meantime, can you take a look at why the tests are failing? The Travis CI logs indicate that it may be a relative importing issue.

vanbenschoten commented 3 years ago

Also, would you have any interest in creating a Jupyter notebook that highlights how to use Matrix Profile with string data types? I think the community would be very interested, and we could easily convert it into a blog post for the MPF website.

AndrewWilkins84 commented 3 years ago

Hi Andrew, I'm pretty sure the unused error is numba in mpx_char.py.  I wanted to perform parallel processing on that function but never got around to it since the solution I developed works for my immediate application. As for creating a Jupyter Notebook, I'll have to decline for now.  I just don't have the bandwidth to take on any more tasks apart from my job workload.  To run what I have, use the following in a Jupyter Noteobook: import numpy as npimport matrixprofile as mp a = np.array(list('abcdababcdddabcd'))p = mp.compute(a)p That should do it.

Very respectfully,Drew Wilkins

On Tuesday, January 26, 2021, 06:43:44 PM PST, Andrew Van Benschoten <notifications@github.com> wrote:  

Also, would you have any interest in creating a Jupyter notebook that highlights how to use Matrix Profile with string data types? I think the community would be very interested, and we could easily convert it into a blog post for the MPF website.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

vanbenschoten commented 3 years ago

No worries - thanks for your contribution! Hopefully I can get this in shortly.

On Wed, Jan 27, 2021, 11:05 AM AndrewWilkins84 notifications@github.com wrote:

Hi Andrew, I'm pretty sure the unused error is numba in mpx_char.py. I wanted to perform parallel processing on that function but never got around to it since the solution I developed works for my immediate application. As for creating a Jupyter Notebook, I'll have to decline for now. I just don't have the bandwidth to take on any more tasks apart from my job workload. To run what I have, use the following in a Jupyter Noteobook: import numpy as npimport matrixprofile as mp a = np.array(list('abcdababcdddabcd'))p = mp.compute(a)p That should do it.

Very respectfully,Drew Wilkins

On Tuesday, January 26, 2021, 06:43:44 PM PST, Andrew Van Benschoten < notifications@github.com> wrote:

Also, would you have any interest in creating a Jupyter notebook that highlights how to use Matrix Profile with string data types? I think the community would be very interested, and we could easily convert it into a blog post for the MPF website.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/matrix-profile-foundation/matrixprofile/pull/74#issuecomment-768429945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB53ISEGAOGHOVYRPLLFXWTS4BBULANCNFSM4WUBKRQQ .