The purpose of this RFC (request for comments) is to gather community feedback on a new proposal for integrating AD with core Dashboards' Visualizations. Doc: see here.

Pasting doc details here so it can be discussed:

RFC - Integration with Visualizations

Problem

Oftentimes users may start out by viewing their ingested data through an OpenSearch Dashboards Visualization (viz), and then grouping multiple together to create a Dashboard. This is helpful to initially be able to understand the data, find patterns, and monitor / observe data as it’s ingested into an OpenSearch cluster. But, after users create these, there’s no clear path for them to take next steps if they want to look further into their data, such as setting up anomaly detection (AD). Some of the challenges of setting up AD include:

learning new AD-specific terms and ideas (anomaly detector, historical analysis, feature, interval, etc.)
detailed and lengthy setup with lots of user-required fields
lots of duplicate work (selecting source data, metrics to analyze) if the user has already created a viz
AD lives in a standalone plugin and not visible on Dashboards overview or home pages - must be accessed from list of plugins in sidebar

Solution

Overview

To help address this disconnect between Visualizations and AD, we propose an integrated solution that allows users to quickly create, run and view anomaly results from a Visualization. For certain Visualizations, users can utilize the effort they've already provided to visualize the data they’re interested in (source data, metrics to analyze), and create an anomaly detector on that same data with just a few extra clicks. The diagram shows the overall user workflow:

viz-integration-workflow

AD is only intended for use with streaming time series data (real-time or historical). Because of this, we focus on the following Visualization types frequently used for time series data: (1) Line, (2) Area, (3) Vertical bar, and (4) Time Series Visual Builder (TSVB).

Generating the detector configuration

Using inputs from a Visualization, the config fields for a detector can be automatically populated with suggested values. In a perfect scenario, no user input is needed to create a detector, and every field can be populated. The most common cases where user info would be needed is the complex fields like features and filter query. These can be partially auto-filled, with some extra user input needed (e.g., auto-filling the feature field, but needing user input to provide a valid aggregation).

We list the logic for generating each default detector field value below (tentative):

Detector field	Logic
`name`	Some simple pattern, e.g. `"<viz>-detector"`
`description`	Some simple pattern, e.g. `"A detector based off of <viz>"`
`time_field`	Time field used in viz
`indices`	Selected index pattern of viz (if `Search` type is selected, then extract index pattern from that)
`features`	Corresponds to `metrics` section in viz. May need user input if invalid aggs are selected
`filter_query`	Combine any custom filters set in viz
`interval`	Use some default value (10 mins), or from length of bucket, if date histogram option is selected
`window_delay`	Use some default value (1 min)
`category_field`	Default to empty. If user has x-axis terms subaggregation set, could auto-fill with that field
`result_index`	Use some default value (none)
`shingle_size`	Use some default value (8)
real-time / historical jobs?	Default to real-time enabled, historical disabled

Invalid inputs

Because Visualizations allow for much more complex ways to display data on a chart compared to AD, it is possible that the configuration will not all be valid when trying to set up an anomaly detector. Some examples include:

user selects a metric aggregation available in a viz that is not available in AD (e.g. pipeline aggregation)
user has more metrics than allowed features in AD (currently 5)
user configures non-date-related x-axis or no x-axis at all

These cases could be handled in a few different ways:

Make AD creation invalid or unavailable until all constraints are met, and provide messaging as to why it's invalid.
Allow user to create by changing the detector config to meet constraints, but warn that the results may be based on different metrics than what's shown on the viz.

Note that x-axis issues won't necessarily affect the AD configuration, but would affect how any anomaly results would be displayed on the chart. Providing some warning indicating this may suffice.

Also note that AD supports custom aggregations when generating features. Future investigation needs to be done to see if all available viz aggregations could be converted appropriately.

UX

Creation workflow

Creating a viz is very simple and only involves selecting a viz type and a source. The rest of the details (metrics, x-axis, all other settings) are set after it’s created. We can follow this strategy for AD by having a button in the viz details to create a detector, which could open some modal/flyout/etc. Depending on how much of the viz is populated/valid, we will autofill as many fields as possible following logic listed in the table above. When everything is valid, user can create the detector and start any real-time or historical detection jobs. The creation will then be disabled, and can instead have a link to the detector details in AD plugin.

Viewing results

Detector results could be displayed in a few different places. On the viz details page, results could be overlaid or annotated on the chart indicating when anomalies occurred. On a Dashboard containing an AD-enabled viz, we could link the detector or detector results within the viz panel, provide options to show AD results within the panel, or show as a separate component altogether. Additionally, for eligible viz’s that don’t have AD enabled, could have an option or a button to configure one, which could link to the AD creation section on the specified viz's details page, or possibly create directly from the existing page.

Request for comments

We would like comments and feedback on the proposal of integrating AD and Visualizations here. Some specific questions we’re seeking feedback include

How do you use Visualizations today?
Would you be more inclined to use AD if it was available to create within a Visualization?
Are there any additional visualization aggregations not provided by AD, that are important for your use case? The current AD-supported aggregations can be found here.
Do you envision viewing the anomalies in a different way than overlaid on Visualization charts?
Would historical anomalies be useful, or are you only interested in real-time / future anomalies?
Would you prefer auto-suggested fields to be hidden by default, or easily available to edit when creating?
Besides integration with Visualizations, would you want a non-UI integration offering directly through APIs?
Do you have any other suggestions for how to handle invalid inputs? Would you prefer a more or less restrictive approach?

Another point of concern not mentioned above - the x-axis must be configured in a very specific way such that the bucket ranges are static. In other words, it must be configured as a Date Histogram aggregation, and must have a non-auto minimum interval - auto will adjust the interval as the time range changes automatically. Even then, the viz may upscale the interval if the range is too big - for example, 30m interval is upscaled to 1h in a 7d range, due to "too many buckets to show in the selected time range": Screen Shot 2022-03-31 at 10 06 56 AM

Because of this, will need to add some mechanism to keep a static interval, such that zooming in or out will not change the interval and produce more or less granular results.

This is all under the assumption that we want a 1:1 mapping of a metric result to a feature result - in other words, the input data to a detector perfectly aligns with the exact data shown in the viz. Perhaps this isn't necessary, and the detector input can be some static interval (e.g., 10 mins), but the chart can show the results in a dynamically aggregated way like it currently does (scaling up or down to show x number of buckets on the chart at once).

Pros/cons of 1:1 mapping

more visibility into the metrics derived for finding the anomaly
requires stricter viewing options from viz perspective - would need to warn if user changes date range, zooms in, etc., or remove those functionalities entirely
requires new logic for creating a static bucket range and viewing results on the viz

Pros/cons of dynamic mapping (having separate interval compared to viz bucket range)

maintain existing viz result viewing experience (dynamically changing histogram bucket size, etc.)
more flexible and available to create AD from different viz's
likely maintains a more desirable interval - for example, data coming in at 1m intervals defaults to a bucket range of 1h on viz (because default time range is 7d). But, 5m/10m is sufficient for an interval, and helps detector finish initialization and start running much sooner than a 1h interval
less visibility into the metrics derived for finding the anomaly - for example, bucket range on viz may be at a low level of granularity such that it looks like no spike happened, but is showing an anomaly (could still see the feature results by visiting dedicated AD results page)

I think I prefer the latter option due to its flexibility and simplicity. Pursuing the former requires much more fundamental changes to the existing viz charts, and may not be desirable from a user perspective anyways. Do you have any thoughts on the following options?

opensearch-project / anomaly-detection

[RFC] AD-Visualizations Integration #476