[BUG] auto_expand_replicas creating more replicas than shard allocation awareness allows

barryhatfield commented 2 years ago

Describe the bug We are using our K8s node names as an attribute on our OpenSearch processes/pods to prevent both index primaries and replicas from being assigned to the same K8s node. The "auto_expand_replicas" setting creates the number of replicas equal to the total OpenSearch data processes/pods (in the case of 0-all) which is far greater than the number of K8s nodes the OpenSearch pods are on. The result is a permanent yellow cluster with unassigned shards.

Additionally, I cannot manually change the "auto_expand_replicas" setting on the system indices. I get a 403 error with a non-existent permission missing.

To Reproduce Steps to reproduce the behavior:

Setup shard allocation awareness using an OpenSearch node attribute: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#shard-allocation-awareness Try something like 3 unique node attribute values in a 9 opensearch node cluster. We used "k8s_node_name" as our node.attr with the k8s node name as the value.
Full restart of the OpenSearch cluster

Check replica allocation status: GET /_cluster/allocation/explain

  "deciders" : [
    {
      "decider" : "same_shard",
      "decision" : "NO",
      "explanation" : "a copy of this shard is already allocated to this node [[.opendistro_security][0], node[0SeJnN8BS4WAcpyp-WRcxQ], [R], s[STARTED], a[id=raIjWQwnRGSLHZewDYro0g]]"
    },
    {
      "decider" : "awareness",
      "decision" : "NO",
      "explanation" : "there are too many copies of the shard allocated to nodes with attribute [k8_node_name], there are [40] total configured shard copies for this shard id and [7] total attribute values, expected the allocated shard count per attribute [7] to be less than or equal to the upper bound of the required number of shards per attribute [6]"
    }

Manually try to fix the replica allocation: Check auto_expand_replicas setting:

GET /.opendistro-anomaly-detector-jobs/_settings

{
".opendistro-anomaly-detector-jobs" : {
"settings" : {
  "index" : {
    "hidden" : "true",
    "number_of_shards" : "1",
    "auto_expand_replicas" : "1-20",
    "provided_name" : ".opendistro-anomaly-detector-jobs",
    "creation_date" : "1647463076484",
    "number_of_replicas" : "20",
    "uuid" : "yj6d84wzQZmN6z-UsVln7A",
    "version" : {
      "created" : "135238227"
    }
  }
}
}
}

Attempt to change the auto_expand_replicas setting:

PUT /.opendistro-anomaly-detector-jobs/_settings
{
"settings" : {
"index" : {
  "auto_expand_replicas" : "0-7"
}
}
}

returns

{
"error" : {
"root_cause" : [
  {
    "type" : "security_exception",
    "reason" : "no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"
  }
],
"type" : "security_exception",
"reason" : "no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"
},
"status" : 403
}

Expected behavior The "auto_expand_replicas" setting should only expand the replica count to the number of unique values configured in shard allocation awareness.

Additionally, I should be able to update the auto_expand_replicas setting for all indices as the admin user.

Plugins opensearch-alerting
opensearch-cross-cluster-replication
opensearch-knn
opensearch-performance-analyzer opensearch-sql opensearch-anomaly-detection
opensearch-index-management
opensearch-ml
opensearch-reports-scheduler
prometheus-exporter opensearch-asynchronous-search
opensearch-job-scheduler
opensearch-observability
opensearch-security

Screenshots n/a

Host/Environment (please complete the following information):

OS: Ubuntu 20.04
Helm chart: https://github.com/opensearch-project/helm-charts
OpenSearch version 1.3

Additional context The forced yellow status is causing problems with our readiness checks and cluster state alerting.

Joelp commented 1 year ago

Same here with version 2.3.0. Using "awareness.attributes": "rack_id"

I solved this issue for index .opendistro_security with: securityadmin.sh -cd ../../../config/opensearch-security/ -dra -icl -nhnv -cacert ... and sh securityadmin.sh -cd ../../../config/opensearch-security/ -us 9 -icl -nhnv -cacert ...

But this doesn't work with index .opendistro-anomaly-detector-jobs.

Joelp commented 1 year ago

On every node restart, I must to do this workaround:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.awareness.attributes": null
  }
}

And after cluster is green:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.awareness.attributes": "rack_id"
  }
}

opensearch-project / OpenSearch

[BUG] auto_expand_replicas creating more replicas than shard allocation awareness allows #2984