Sample data creation fails for the new feature

Arpit-Bandejiya commented 1 year ago

Describe the bug

Due to the changes done here in Opensearch : https://github.com/opensearch-project/OpenSearch/pull/3462/files#diff-013717f93370bf1d9635d1b84aee81e7e003e3fd6c6bb7c74b9890a1327a04b6

We are seeing that the sample data creation is failing due to low replica count

To Reproduce Steps to reproduce the behavior:

create a cluster with the feature mentioned above enabled( set thecluster.routing.allocation.awareness.balance in the opensearch.yaml file to enable the feature).
click on sample data creation in dashboard.

Expected behaviour We should be able to create the sample data from the dashboard.

OpenSearch Version latest version

Dashboards Version Any dashboard version supported

Plugins

Please list all plugins currently enabled.

Host/Environment (please complete the following information):

OS: Mac OS
Browser: Chrome

ananzh commented 1 year ago

@Arpit-Bandejiya do we when this will be release in OS? is it v2.4? Thx

AMoo-Miki commented 1 year ago

@Arpit-Bandejiya Does it produce any logs or errors during the failure? Also, when you say latest version, do you mean main branch of OpenSearch or the 2.3 release?

kavilla commented 1 year ago

@Arpit-Bandejiya can you provide insight on what the fix here? And if our sample data is failing could it be possible that others will experience this in other ingest software? Therefore, is this not technically a breaking change and it should a 3.x change?

Arpit-Bandejiya commented 1 year ago

@Arpit-Bandejiya can you provide insight on what the fix here? And if our sample data is failing could it be possible that others will experience this in other ingest software? Therefore, is this not technically a breaking change and it should a 3.x change?

Replica count enforcement is done only when cluster.routing.allocation.awareness.balance is enabled. This feature is disabled by default. Hence it is not a breaking change.

@Arpit-Bandejiya Does it produce any logs or errors during the failure? Also, when you say latest version, do you mean main branch of OpenSearch or the 2.3 release?

error reponse:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "invalid_index_template_exception",
        "reason" : "index_template [template_1] invalid, cause [Validation Failed: 1: expected total copies needs to be a multiple of total awareness attributes [3];]"
      }
    ],
    "type" : "invalid_index_template_exception",
    "reason" : "index_template [template_1] invalid, cause [Validation Failed: 1: expected total copies needs to be a multiple of total awareness attributes [3];]"
  },
  "status" : 400
}

This feature is present in main as well as in 2.3 release

joshuarrrr commented 1 year ago

Triage - does the sample data need to support any combination of user settings/options (what's the purpose and use-case for sample data)?

UX: should sample data support non-default cluster configurations?

ohltyler commented 1 year ago

@Arpit-Bandejiya can you describe more on what the fix should be? Does the sample data indices settings need to have a dynamic way for specifying certain settings fields, such as auto_expand_replicas? From the error messages, it seems it will need to be dynamic based on the cluster's total awareness attributes

ohltyler commented 1 year ago

Update - I've learned the key cluster setting to be aware of is the max awareness attribute value, of which there could be multiple (AZs, rack IDs, etc.). The upper limit of auto_expand_replicas must be a multiple of that. Note that this setting by default does not take into account awareness attributes. From documentation:

Note that the auto-expanded number of replicas only takes allocation filtering rules into account, but ignores any other allocation rules such as shard allocation awareness snd total shards per node

Because of this, if cluster.routing.allocation.awareness.balance is set to true, and a user ingests sample data, there is no current way (I believe) to easily read the total awareness attribute value and update the index setting before index creation, and so the ingestion may fail if the replica count isn't a multiple of the max AZ count.

Maybe just adding documentation around this setting is sufficient. @gbbafna can you point me to the current documentation for this setting? I can't seem to find it in the OpenSearch docs.

I will defer the decision to the feature owner and Dashboards team for deciding on the path forward. From a plugin owner perspective, it is more logical and maintainable to maintain the same sample data index configuration as that of core Dashboards, and so I will work on a fix in the AD plugin to consume such settings.

gbbafna commented 1 year ago

Hi @ohltyler : Please find the documentation in https://opensearch.org/docs/latest/tuning-your-cluster/cluster/ . Search for Replica count enforcement in here.

gbbafna commented 1 year ago

We have also added default_replica_count as a cluster level setting : https://github.com/opensearch-project/OpenSearch/pull/5610/ . For sample data, it should be fine to use that instead of using auto expand replica at all . Using that , AD won't need to bother about all of the cluster settings used as well .

ohltyler commented 1 year ago

Yes- totally agree. Thanks for providing this option!

We can eliminate this setting and consume cluster defaults. I will work on making that change on the AD plugin side.

ohltyler commented 1 year ago

Update: AD-related changes have been merged & backported - see https://github.com/opensearch-project/anomaly-detection-dashboards-plugin/pull/423

opensearch-project / OpenSearch-Dashboards

Sample data creation fails for the new feature #2633