[Proposal] Forecasting Dev Ops Metrics and Financial/Inventory Metrics

What are you proposing?

The current log analytics solution is great at showing how things are performing in the moment. When looking at log metrics, users also want to understand where things are trending so that they can plan accordingly. Forecasting will show the user where their metrics are headed so that they can prepare accordingly (e.g. disk space, server instance upgrades, or even license usage if tracked).
Show a future view of time series metrics displayed on Dashboards which will feed into Observability and Security Analytics
Forecast when something may trigger an alert in the future to help users stay on top of their areas of responsibility
Create a workflow which allows someone to upload their dataset and validate that it is correct ahead of performing the forecasting job
Include “what if” scenario support which allow the user to work through scenarios like what if traffic increased 4x what would happen to my systems
Have popular forecasting dashboard templates to make it easy to setup
Easily share a forecasting dashboards with colleagues and the community
Tee up a scheduled report of a specific dashboard to be reviewed quarterly

What problems are you trying to solve?

There are two personas that we propose to solve for. The first priority is the developer operations persona who wants to view dashboards and understand where metrics are headed and the second priority is the business analyst persona. A key differentiator between the personas would be data size. The developer operations persona will likely only have 90-120 days worth of data available to them given that saving application logs is costly. The developer operations persona wants to make sure that the applications, service tiers, or services are up and fully functional. They live more in the moment and would like to be proactively notified if their area of responsibility is going to have problems.

The second persona is the business analyst persona who wants to look at data over a longer period of time (120+ days). The data the business analyst is looking at has seasonality trends in it. The business analyst wants to understand how to project out financial metrics such as revenue or inventory levels.

User Stories

Developer Operations Persona (90 - 120 days of data)

As someone who is responsible for critical services, I want to look at my dashboard visualizations and forecast where metrics are moving, so I can understand if I can take proactive actions and change the outcome.
Display an augmented view of forecasting on a given visualization w ranges of confidence (e.g. as time or variability in the data increases confidence decreases)
As someone who wants to forecast way into the future, I want to be told that my attempts to forecast out that far are not realistic, so I can set the right expectations with myself.
As someone who is responsible for critical services, I want to setup select alerts to notify me if there are anticipated issues with the service, so I can proactively respond to problems instead of reacting
Given that someone has turned on forecasting for their monitor-alert, when an alert is triggered, it needs to sent in a way which differentiates it from other alerts (different urgency)
As someone who wants to manage monitors/alerts from an API, I want the ability to turn on Forecasting, so I don’t have to do things interactively
As someone who is responsible for critical services, I want to setup “what if scenarios”, so I can plan ahead and know if web traffic increases by 2x that my services are going to fall over.
As someone who prefers one forecasting algorithm over another, I want to choose which one to use as a default and through an advanced menu

Business Analyst Persona (120 days + of data)

As someone who has a large dataset that they want to evaluate, I want to connect that dataset to Forecasting, so I can run a job on the dataset
What kind of datasets are going to be supported and what kind of help with be present in OpenSearch (differentiating between immediately uploaded datasets v. pipeline datasets)
As someone who has connected a large dataset, I want to visualize the dataset so that I can validate that the data I anticipate is in there, is in fact there ahead of forecasting
As someone who has validated the data is ready for forecasting, I want to choose and be informed of which forecasting algorithm to use, so I can forecast with confidence
As someone who has initiated the forecast, I want to view my results on a visualization which can be a part of a Dashboard.
As someone who wants to forecast way into the future, I want to be told that my attempts to forecast out that far are not realistic, so I can set the right expectations with myself.
As someone has evaluated the data, I want to share the results in CSV, so that I can manipulate them offline
As someone who has evaluated the data, I want to setup “what if scenarios”, so I can plan ahead and know if web traffic increases by 2x that my services are going to fall over.

Reporting

Send out quarterly report to a distribution list to show the forecast of what is happening within the user’s realm of responsibility (e.g. services owned, service tiers owned, etc.)

Administration

As someone who is concerned with performance, I want to make sure my forecasting workloads don’t interfere with critical OpenSearch ingestion/indexing.
As someone who is concerned with performance, I want to be able to control who has access to the forecasting feature in the system, so I can avoid business critical functions of the system tipping over.

Outstanding Questions

Is the prioritization of personas correct? Any additional personas that we need to be mindful of?
What persona would be valuable from an API first perspective?
Would the community find it valuable to have a dedicated saved Forecasting view/state on their dashboards to return to in the future?
Forecasting is an overloaded term. Is there better naming that we can use to differentiate between the anticipatory results displayed to the developer operations specialist compared with the business analyst?
Forecasting can be helpful in ML Commons, Anomaly Detection, Alerting, Dashboards, and Performance Analyzer. Is there an opinion on how to implement Forecasting which would allow each plugin/extension the ability to take advantage of the feature?
What Forecasting algorithms would the community find interesting?
Would the community find forecasted alerting mechanisms helpful?
Would 'what if scenarios' would be interesting to the community?
Would the community value a SQL integration with Forecasting?

Both persona would require data for a longer period, ideally a year. With less than a year of data, you are likely to miss the season variations which might not just depend on last 90 days or 120 days of trend. Ideally there should be a base model to start and each of the persona supplement the data they have to derive their needs.

As we're working toward introducing Forecasting, I'd like to share some of the exploratory work for the DevOps metrics use case from the UX perspective.

A couple of scenarios for a DevOps or SRE might look like that:

“As someone who is responsible for critical services, I want to look at my dashboard visualizations and forecast where metrics are moving, so I can understand if I can take proactive actions and change the outcome”.

"I'm aware of a data point that is not included in the model (seasonal trend longer than 90-120 days, planned external event) to forecast the trend. I would like to see how it will affect the trend."

presentation 2

Proposed flows:

From an existing visualization

presentation 4

Visualize your data and forecast

presentation 5

Create alerting monitor and augment it with a predictive alert trigger

presentation 6

Stay tuned for more updates, and please share your feedback on any aspects of this proposal. I also have some open questions listed below:

Do you use forecasting? If you do, what are you using it for?
What tools do you currently use for forecasting? Alerting? How does your flow look like?
What kind of configurations you’d like to have for forecasting your metrics?

Please share any additional feedback!

Both persona would require data for a longer period, ideally a year.

I'm not sure I agree with this statement or the postulation in the issue itself (e.g., the persona will need specific data windows). While most people in either of these personas will likely say more data is better (not true in many cases IMO), the time window needed to meaningfully forecast will vary fairly significantly based on the application, use case, data variability, etc. In particular, the ability to create a reasonable range of values in which future values of interest are likely to fall has many factors and data over time (i.e., as measured in length over time) is only one of those.

For example, if I am charged with monitoring calls to an API, it might be very feasible to alert on a forecasted utilization rate if the API calls have had low variability and high usage in the last week and a very clear trend in upward utilization in the past day. This is one of hundreds of examples where I would absolutely not need a year of data to meaningful forecast values.

@brijos Is there a reason we need to specify the required data retention or the length of time each personal "needs" data given it is so use case specific? I'm not sure how this informs a technical requirement in a world were we would want to allow users to select a time range that is suitable to their use case (and each user will want to make their own tradeoff decision between cost, speed, accuracy, etc.).

forecast where metrics are moving

Minor nit here. I think it's more accurate to say "I want to see where metrics are forecasted to be" as the entire feature is to remove the need for the user to employ naive, intuitive, or heuristic forecasting.

seasonal trend longer than 90-120 days

I'm not sure I understand this from a forecasting or analytics use case as written. Traditionally, seasonality is a characteristic of data over a time period where there is some regular and predictable changes that recur. Are we trying to say that the user can simply change the time range of the input data to allow for different windows of time for scenario forecasting?

For example, if I was a user and I was performing a forecast, I want the model to look for a cycle of seasonality based on the data, e.g., 12-month cycle of seasonality for monthly data, four-quarter cycles for quarterly data, weekly seasonality for daily data, hourly seasonality for intraday data, etc. OR based on cycles of seasonality I choose in my scenario planning/exploration. Is this what we want to support long-term?

What kind of configurations you’d like to have for forecasting your metrics?

So, I can personally think of a few that I'd want to see (depending on the underlying configurability and capabilities). These would include;

forecast model type (e.g., linear, logarithmic, etc.)
seasonality characteristics (e.g., yearly, monthly, weekly, daily, hourly, etc.)
granularity (e.g., year, month, day, etc.)
backward looking timeline (e.g., only use data from the last X periods)
prediction intervals to display (e.g., 90%, 95%, 99%)
how to display the predictions (e.g., whiskers, bands, lines, etc.)
other layers to add (e.g., view upper prediction only, layer secondary forecast, etc.)

UX perspective

One other callout that may be worthwhile to think about: The same UX treatments that are beneficial to forecasts would be useful in other analysis such as trend line or line of best fit. There are a ton of predictive modeling functions outside of forecasting that could be / will be added in the future that might be worth thinking about.

opensearch-project / ml-commons