zpzim / SCAMP

The fastest way to compute matrix profiles on CPU and GPU!
http://www.cs.ucr.edu/~eamonn/MatrixProfile.html
MIT License
155 stars 35 forks source link
cuda gpu matrix-profile python time-series time-series-analysis

Build and Test RTD Build Status

Docker Build and Push Docker Image Version (latest semver) Docker Image Size (latest semver) Docker Pulls

PyPI PyPI - Downloads

Conda (channel only) Conda Conda

Conda (channel only) Conda Conda

DOI

SCAMP: SCAlable Matrix Profile

Table of Contents

Overview \ Documentation \ Performance \ Python Module \ Run Using Docker \ Distributed Operation \ Reference

Overview

This is a GPU/CPU implementation of the SCAMP algorithm. SCAMP takes a time series as input and computes the matrix profile for a particular window size. You can read more at the Matrix Profile Homepage This is a much improved framework over GPU-STOMP which has the following additional features:

Why use SCAMP?

Documentation

SCAMP's documentation can be found at readthedocs.

Python module

pyscamp is available through conda-forge:

# To install pyscamp with cpu/gpu support on Linux and Windows.
conda install -c conda-forge pyscamp-gpu

# To install pyscamp with cpu support only on Windows, Linux, or MacOS.
conda install -c conda-forge pyscamp-cpu

Note that pyscamp-gpu can be installed and used even if you don't have a GPU, it will simply fall back to using your CPU. However, pyscamp-cpu is preferrable if you don't have a GPU because it builds with a newer compiler and does not require installing the cudatoolkit depencency.

If you run into problems using GPUs with pyscamp-gpu make sure your NVIDIA drivers are up to date. This is the most common cause of issues.

Installing from source

If you want you can build pyscamp from source which will have improved performance. A source distribution for a python3 module using pybind11 is available on pypi.org to install run:

# Python 3 and a c/c++ compiler is required.
# cmake is required (if you don't have it you can pip install cmake)
pip install pyscamp

Once installed you can use SCAMP in Python as follows:

import pyscamp as mp

# Allows checking if pyscamp was built with CUDA and has GPU support.
has_gpu_support = mp.gpu_supported()

# Self join
profile, index = mp.selfjoin(a, sublen)

# AB join using 4 threads, outputting pearson correlation.
profile, index = mp.abjoin(a, b, sublen, pearson=True, threads=4)

More information and the API documentation for pyscamp is available on readthedocs

Run Using Docker

You can run SCAMP via nvidia-docker using the prebuilt image on dockerhub.

In order to expose the host GPUs nvidia-docker must be installed correctly. Please follow the directions provided on the nvidia-docker github page. The following example uses docker 19.03 functionality:

docker pull zpzim/scamp:latest
docker run --gpus all \
   --volume /path/to/host/input/data/directory:/data \
   --volume /path/to/host/output/directory:/output \
   zpzim/scamp:latest /SCAMP/build/SCAMP \
   --window=<window_size> --input_a_file_name=/data/<filename> \
   --output_a_file_name=/output/<mp_filename> \
   --output_a_index_file_name=/output/<mp_index_filename>

Distributed Operation

We have a client/server architecture built using grpc. Tested on GKE but should be possible to get working on Amazon EKS as well.

For more information on how to use the scamp client and server, please take a look at the documentation

Reference

If you use SCAMP in your work, please reference the following paper:

Zimmerman, Zachary, et al. "Matrix Profile XIV: Scaling Time Series Motif Discovery with GPUs to Break a Quintillion Pairwise Comparisons a Day and Beyond." Proceedings of the ACM Symposium on Cloud Computing. 2019.