opensearch-project / opensearch-build

🧰 OpenSearch / OpenSearch-Dashboards Build Systems
Apache License 2.0
135 stars 270 forks source link

[Proposal] Changing Anomaly Detection (OS Plugin) to a different name (Time Series) #2880

Open amitgalitz opened 1 year ago

amitgalitz commented 1 year ago

Is your feature request related to a problem? Please describe

We want to bring a new forecasting solution to OpenSearch. Lots of the experience is coupled with AD code, it will use similar resource management, fault tolerance and interaction with the RCF library that AD is using. However from a user experience perspective this new feature shouldn’t be coupled with the Anomaly Detection API and user flow.

Describe the solution you'd like

We are proposing to change the name of our current Anomaly Detection plugin to a different name such as time-series-analytics. We would then include two folders with the potential of more in the future that correlate to our different time-series-analytics solutions, with the first two being Anomaly Detection and Forecasting.

What the new name will be and the user experience effects of changing the plugin name are all up to debate and will be important in the ultimate decision if we proceed with this approach. However, for this issue I wanted to both describe two options in terms of what the API will look like and also bring up to discussion what changing a plugin name means in terms of bwc, best practice for OS, and all infrastructure work like maven uploads and releases.

Describe alternatives you've considered

No response

Additional context

BWC:

If we chose Option B or C then we can keep plugins/_anomaly_detection as deprecated for 2.x and completely remove in 3.0. If option A is chosen that will be the API we move forward for the upcoming future.

Potential infra changes in OpenSearch-build

Testing out a B/G upgrade with changing a plugin’s name → Success

Steps:

  1. Run Opensearch cluster called with 1 node that is using OS 2.1 with the original Anomaly Detection plugin and create a detector.,
  2. Add another node to the cluster with OS 2.2 with the plugin name changed to time-series-analytics.
  3. Make sure detectors is still running correctly.
  4. Create an additional detector and make sure it runs correctly
  5. Safely remove the first node with Anomaly Detection plugin and ensure all detectors are running with no issue after transitioning to only one node with time-series-analytics plugins
rishabh6788 commented 1 year ago

tagging @peterzhuamazon @gaiksaya @prudhvigodithi @zelinh @bbarani for inputs.

elfisher commented 1 year ago

Hi @amitgalitz I understand the desire to change the repo name, but I'm not sure why the end user APIs are impacted by the repo name.

It feels like there are two separate problems we are folding into one problem.

  1. Do we want separate APIs for forecasting an anomaly detection?
  2. How do we want to manage the code of these two features?

Is that right?

amitgalitz commented 1 year ago

@elfisher, while I do agree that in practice these two decisions can be decided in isolation, I think there is correlation and I guess a part of what I am asking is if it’s okay to break this correlation with the change.

In this proposal I am making the assumption that we don’t want to give an API of _plugins/_anomaly_detection/_forecasting/… so I am treating “1. Do we want separate APIs for forecasting an anomaly detection?” as answer with a yes (this can be discussed more obviously).



In our conventions documentation (https://github.com/opensearch-project/opensearch-plugins/blob/main/CONVENTIONS.md#apis) we make a reference that every plugin API should start with /_plugins/ and although we don’t specifically mention that the next thing has to be the plugin name it is the example and I also believe it’s currently how most other plugin API are (I could have missed some plugins). In all cases we remove opensearch prefix for API. 

 


In general from my experience whatever plugin I see in cat/_plugins I can expect to have at least a somewhat similar API to use in the form of /_plugins/<plugin_name>.

 


 


While I might be overly cautious here and we might in the future break this pattern anyway. I assumed its fair to bring up if it would be weird for our plugin name to be real-time-analytics, have cat/_plugins show opensearch-real-time-analytics, but the APIs are /_plugins/_anomaly_detection and /_plugins/_forecasting/. It could provide a weird development experience when they go and try and find the GitHub repo to open issues, when they look at documentation and etc. 

Also there is the other part of changing a plugin name, that involves changes to what the artifact people find on maven is and other infra changes I am not aware of. 


 


update: Also some cases where this best practice isn't as clear is in regards to the observability plugin and SQL/PPL. For example the _plugins/_ppl/_explain API where there is technically no explicit PPL plugin shown in the distribution.

dblock commented 1 year ago

I would 1) work backwards from the "ideal" case where we would be designing this for the first time, 2) deprecate routes that aren't ideal in 2.x, 3) remove deprecated routes in 3.0. If that's a path that we can follow, let's debate the merits of 1) only? What are the options?

wbeckler commented 1 year ago

Your question is about versioning, naming, and compatibility. The answer may be that the plugin architecture is not amenable to improvement, but that's not the end of the story.

The extensions framework aims to address these challenges with versioned APIs, a centralized registry, and sandboxed processes. Maybe the right way to do this would be to build the common features of AD/forecasting as a permanent extension point or as a time series extension, and then AD and forecasting would be extensions built on that extension point or extension API.

dblock commented 1 year ago

@wbeckler Are you saying "enable AD to add an extension point to the SDK"? I think we agree.

amitgalitz commented 1 year ago

@dblock After some more consideration on this proposal as a whole as we considered options such as not renaming the plugin, it looks like we be potentially moving forward with a rename. In that regards, I agree with the idea of “working backwards from the ideal case”.

In the “ideal” case there are two options that we see moving forward.

Option 1 [Preferred]: If we decide to change the plugin name to something like time-series-analytics I think it makes sense to also keep a similar pattern of _plugins/<plugin_name>/... for the APIs. This means we would be going forward with new API paths of _plugins/time_series_analytics/_anomaly_detection/... and _plugins/time_series_analytics/_forecasting/.... The next step would then be to deprecate the older _plugins/_anomaly_detection in 3.0. like you mentioned.

Option 2: We can obscure the new plugin name of time_series_analytics and view it as more of a container for similar products such as anomaly detection and forecasting.

This would mean that we would go forward with _plugins/_anomaly_detection and _plugins/_forecasting. I think it can be also argued that this is the ideal solution as these are the two products within the plugin and it would make sense also for users to just view this functionality as completely separate whether or not they are packaged together, but it would break the pattern of _plugins/<plugin_name>/...

For now I am leaning towards moving to plugins/time_series_analytics/_anomaly_detection/... and _plugins/time_series_analytics/_forecasting/...

This however also even more so brings the discussion if we need to make a change on the dashboards plugin for AD. And for example if we can still show AD and forecasting on the plugin list in OSD or if its a non-negotiable must to now have time-series-analytics on there as the main plugin.

dblock commented 1 year ago

I like option 1 because it's prettier if we were restarting, however, I also don't like breaking users unnecessarily. Thus I don't see a problem with option 2. In that one _plugins/_anomaly_detection remains as is, and _plugins/_forecasting is added, which requires no deprecation and just adds something new that doesn't look bad at all. Do we really care to solve for the plugin name/api convention when the plugin contains 2 APIs? What if you re-split it in the future because someone wants to build a new forecasting plugin?

I won't lose sleep over either! You should decide what's best.