opendistro-for-elasticsearch / anomaly-detection

A machine learning plugin in Open Distro for real time anomaly detection on streaming data.
https://opendistro.github.io/for-elasticsearch-docs/docs/ad/
Apache License 2.0
78 stars 36 forks source link

RFC: Unified flow for realtime and historical anomaly detection #380

Open ylwu-amzn opened 3 years ago

ylwu-amzn commented 3 years ago

We are going to build unified workflow for realtime and historical anomaly detection within the same detector. So user doesn't need to create new historical detectors if they want to use the same realtime detector configuration to do historical anomaly detection, vice versa.

ylwu-amzn commented 3 years ago

RFC: unified flow for realtime and historical anomaly detection

This request for comments (RFC) is to introduce the anomaly detection process change with universal flow and discuss with community to gather comments/suggestions.

Problem Statement

We currently have separate workflows for creating realtime and historical detectors (released in ODFE 1.13 which only supports single entity detection now). Additionally, the workflow for creating real-time detectors requires users to configure the detector and model configurations in separate pages, which is unclear and can add confusion. If users need to detect historical data with the same realtime detector configuration, they need to create a new historical detector. It’s not easy to maintain two detectors with the same configuration. Generally, user will check how model performs and tune detector configuration based on historical data, then create realtime detector with the same configuration. Users need to switch between real-time and historical detectors to review anomaly results.

Proposed Solution

We will build a unified flow to run both realtime and historical anomaly detection within the same detector. Through a clear, end-to-end, step-by-step process, users can click through a single workflow to perform all of the necessary steps to create and run anomaly detection jobs as soon as possible. These steps include:

(1) Defining the data source (index, filter query, interval etc.) (2) Configuring the model (defining features, model hyper parameters) (3) Scheduling real-time anomaly detection jobs or running a historical analysis, or both.

Additionally, we are going to support high cardinality historical anomaly detection, which is currently only supported in the real-time use case.

For new users, they can play around with historical data first to learn how AD plugin works and tune data source and model configuration. When user feels the results looks good, they can start real-time job to detect streaming data. For current user who has real-time detector running, they can also start a new historical analysis within the same detector. If user has historical detector, they can also start real-time job by one click.

We will also provide more ways to visualize the anomaly results, like daily/weekly/monthly aggregated anomaly counts, anomaly comparison, and trend analysis.

Providing Feedback

If you have comments or feedback on our plans for Unified Flow, please comment below.

agarwalvijay commented 3 years ago

Thanks @ylwu-amzn ! Is there an ETA on the release of this unified glow?

ylwu-amzn commented 3 years ago

hi, @agarwalvijay , thanks for your interest. It will be in next ODFE release, keep an eye on ODFE release.

mmguero commented 3 years ago

I am a little confused with the release of OpenSearch and how that affects or doesn't affect this project. Will there be another ODFE release? When and where will we see this feature released?