opensearch-project / anomaly-detection

Identify atypical data and receive automatic notifications
https://opensearch.org/docs/latest/monitoring-plugins/ad/index/
Apache License 2.0
64 stars 75 forks source link

[META] Support ML dynamic node role #571

Open brijos opened 2 years ago

brijos commented 2 years ago

Is your feature request related to a problem? We released ml-commons plugin in OpenSearch 1.3. It supports training model and predicting. ML model generally consuming more resources, especially for training process. The community wants to support bigger ML models which might require more resources and special hardware like GPU.

As OpenSearch doesn’t support ML node, we dispatch ML task to data node only. That means if user wants to train a large model, they need to scale up all data nodes which can be costly. And ML tasks will use shared resources on data nodes which may impact the core searching/indexing function.

What solution would you like? Support a dedicated ML node, users don’t need to scale up their data node at all. Instead just configure a new ML node (with different settings, more powerful instance type) and add it to cluster via the YAML file (requires a cluster restart). By doing so, users can separate resource usage better by running ML task on dedicated node which can reduce impact to other critical tasks like search/ingestion.

What alternatives have you considered? Original Proposal

Do you have any additional context? Original Proposal

ylwu-amzn commented 2 years ago

We don't have enough bandwidth to support ML node in AD 2.1 release. We plan to support ML node in ml-commons plugin 2.1 release, https://github.com/opensearch-project/ml-commons/issues/79

ohltyler commented 2 years ago

@ylwu-amzn how would you describe the status of this issue?

ylwu-amzn commented 2 years ago

ml-commons already released ML node support in OpenSearch 2.1. I think it will benefit AD, especially for HCAD if we can support dedicated ML node in AD. For example, we can increase the memory limit for HCAD so we can support more entities. If no dedicated ML node, we can fall back to data nodes.

I think we can keep this open for a while. If the decision is dedicated ML node will not help much on AD, we can close this.

bbarani commented 1 year ago

@brijos @ylwu-amzn Can you please re-tag this issue with the correct version label?

hrishikesh91 commented 1 year ago

Hi, any idea if / when would dedicated ML node support in Anomaly Detector be scoped in any of the upcoming release plans?

kaituo commented 1 year ago

So far we don't.