timescale / timescaledb-toolkit

Extension for more hyperfunctions, fully compatible with TimescaleDB and PostgreSQL 📈
https://www.timescale.com
Other
385 stars 48 forks source link

Matrix Profile #438

Open rtwalker opened 2 years ago

rtwalker commented 2 years ago

The matrix profile is a (relatively) newer development in time-series analysis that is useful for things like anomaly detection and identifying similar segments/patterns. I think the matrix profile is broadly useful but could fit especially well with Timescale's observability stuff.

Links:

rtwalker commented 2 years ago

Might have some overlap with #45

davidkohn88 commented 2 years ago

This is really cool, I just started looking at it and it seems really interesting in terms of its approach as well as something that could be innovative for our customers. I am having trouble finding a good explanation of what you can do with it in a way that is accessible. Do you think we could work on that? If we can understand how we would describe it to users, I think it would definitely be worthwhile to invest some time in working on it. But if we can't figure out how to describe it to users, it's going to be very difficult to get them to use it.

rrindels commented 1 year ago

This is the future of time series anomaly detection in my opinion. It beats every brute force method we have tried. The issue we have with Matrix Profiles is that we always have to go back to is the complexity and requirement of pulling our data out of TimeScale tables , computing and coercing, and then shoving back into sets of Matrix Profile Tables, Anomaly Tables, and Motif Tables. The use cases are too numerous to mention, but Normalizing and Creating the Euclidian Distances to compute the Matrix profile, and then maintaining those updates as data is added / removed would be a huge step towards providing value back to any mining operations on the data for the anomalies and motifs etc. When we sell the concepts to our clients as a hybrid of Machine Learning and Anomaly Detection for Time Series Data, that is usually enough to get entry, even though the other useful natures ( Motifs, Chains , etc ) are just as easily accessible. This adds a simple capability for quick associations of estimating Seasonality , automated thresholding computations which can naturally follow curves and seasonal spikes. So many uses! If we could compute all of this inside PG without having to export to external computations engines, it would make it a nearer to real time value and I think put TimeScaleDB in a unique class of tools which are ahead of the curve.