matrix-profile-foundation / matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
https://matrixprofile.org
Apache License 2.0
362 stars 62 forks source link

Added in annotation vectors #16

Closed nikita-smyrnov closed 4 years ago

nikita-smyrnov commented 4 years ago

Added unit testing, documentation, citations, and functionality for annotation vectors in the matrix profile library as per specifications on the main website.

codecov[bot] commented 4 years ago

Codecov Report

Merging #16 into master will increase coverage by 0.63%. The diff coverage is 98.83%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #16      +/-   ##
==========================================
+ Coverage   90.97%   91.61%   +0.63%     
==========================================
  Files          25       26       +1     
  Lines        1906     2075     +169     
==========================================
+ Hits         1734     1901     +167     
- Misses        172      174       +2     
Impacted Files Coverage Δ
matrixprofile/algorithms/top_k_motifs.py 93.91% <88.88%> (-0.69%) :arrow_down:
matrixprofile/transform.py 99.09% <99.09%> (ø)
matrixprofile/__init__.py 100.00% <100.00%> (ø)
matrixprofile/io/protobuf/proto_messages_pb2.py 100.00% <100.00%> (ø)
matrixprofile/io/protobuf/protobuf_utils.py 66.34% <100.00%> (+3.84%) :arrow_up:
matrixprofile/utils.py 90.62% <100.00%> (ø)
matrixprofile/visualize.py 97.61% <100.00%> (+0.21%) :arrow_up:
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b98f1c4...853d48e. Read the comment docs.

nikita-smyrnov commented 4 years ago

I've fixed most of the issues, but there's one problem: for some reason, the python 2.7 build is failing due to the av_type key not being saved in the IO MPF format. I'm not particularly sure if Python 2.7 (or protobuf) handles strings differently than Python 3x. Perhaps there is something else missing that needs to be added to ensure backwards compatibility? Would appreciate some help in debugging this.

nikita-smyrnov commented 4 years ago

I think it has something to do with basestring being the way to detect strings in Python 2x whereas str and bytes were used in Python 3x. I added in some boilerplate code to fix the issue but it still doesn't seem to fully resolve the issue.

https://github.com/matrix-profile-foundation/matrixprofile/blob/2e30406a5c2ef652b3e6434c5ae956d2d9f99c39/matrixprofile/io/protobuf/protobuf_utils.py#L8

tylerwmarrs commented 4 years ago

This seems to fix Python 2.7 compatibility. I also ran tests against 3.7.

py2-protobuf-av.patch.txt

nikita-smyrnov commented 4 years ago

The patch you posted needed a slight change (check if string is empty before adding it to the MP data structure), but everything should be working now. As a summary of what has been fixed:

All the TravisCI and code coverage tests have also passed. Let me know if there's anything else you'd like to me handle for the AVs.

tylerwmarrs commented 4 years ago

Thank you for all of the hard work on this feature. Everything is looking good. One thing that would be nice is to have a quick "tutorial/overview" on annotation vectors in the documentation. Not many people are aware of them and what they are used for. If you are up to it, adding an example in docs/examples as a jupyter notebook would be awesome.

Basically, include:

nikita-smyrnov commented 4 years ago

I've added in an example notebook where ECG time series data is analyzed. The first half contains a calibration signal motif that is better conserved that the heartbeat motif in the second half. The notebook then uses an AV to ignore the first half of the time series so that the important motif (the heartbeat signal) can be selected for.

I've also submitted a PR here to account for this additional dataset: https://github.com/matrix-profile-foundation/mpf-datasets/pull/3

Also note that I've updated the motifs and visualize functions to account for this extra CMP/AV in the data structure. This way the user can select whether they want to use the MP or CMP when finding motifs, and visualize will generate a series of extra plots containing the updated CMP and AV. Documentation has also been updated.

Feel free to look at the example notebook for a visual example of how this occurs.

Let me know if there's anything else you'd like added in or edited for this feature.

tylerwmarrs commented 4 years ago

This is good. A couple of modifications:

  1. a CMP is not applicable to a PMP, however, you edited the top k motifs for PMP here: https://github.com/StateFarmIns/matrixprofile/blob/master/matrixprofile/algorithms/top_k_motifs.py#L16
  2. The visualize function labels the AV as the CMP per your example (see screenshot). This may be fixed in your code already? I didn't see it there.

image

  1. The dataset pull request should have a "txt" extension as it is not formatted as "csv".
tylerwmarrs commented 4 years ago

Unrelated to this code being merged, if you want to create a blog post on our MPF website, you can cross-post your example there. You will be acknowledged etc for it.

Here is the github repo: https://github.com/matrix-profile-foundation/matrix-profile-foundation.github.io

Here is an example post: https://github.com/matrix-profile-foundation/matrix-profile-foundation.github.io/blob/src/website/posts/nyc-taxi-analysis-anomalies-python.meta

The .meta file is just metadata around the post. The .ipynb file is the actual post content.

You will want to add an author image per the README file. :)

nikita-smyrnov commented 4 years ago
  1. How would you want me to account for this since both PMP and MP motif functions use a general wrapper function to get called? I submitted a possible change, but I'm not sure if this is the best way to go about this. The only reason why I added it for the PMP motifs was to keep the original motif wrapper code, but I can see now why that would be confusing.

  2. Fixed (was using an older version of code).

  3. Fixed.

I'll definitely make sure to work on the blog post and will get that done once this pull request is completely finished and merged into the main repository.

tylerwmarrs commented 4 years ago

I think your approach is good. There really isn't a nice way around it.