opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.05k stars 1.67k forks source link

[BUG] auto_expand_replicas creating more replicas than shard allocation awareness allows #2984

Open barryhatfield opened 2 years ago

barryhatfield commented 2 years ago

Describe the bug We are using our K8s node names as an attribute on our OpenSearch processes/pods to prevent both index primaries and replicas from being assigned to the same K8s node. The "auto_expand_replicas" setting creates the number of replicas equal to the total OpenSearch data processes/pods (in the case of 0-all) which is far greater than the number of K8s nodes the OpenSearch pods are on. The result is a permanent yellow cluster with unassigned shards.

Additionally, I cannot manually change the "auto_expand_replicas" setting on the system indices. I get a 403 error with a non-existent permission missing.

To Reproduce Steps to reproduce the behavior:

  1. Setup shard allocation awareness using an OpenSearch node attribute: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#shard-allocation-awareness Try something like 3 unique node attribute values in a 9 opensearch node cluster. We used "k8s_node_name" as our node.attr with the k8s node name as the value.
  2. Full restart of the OpenSearch cluster
  3. Check replica allocation status: GET /_cluster/allocation/explain
      "deciders" : [
        {
          "decider" : "same_shard",
          "decision" : "NO",
          "explanation" : "a copy of this shard is already allocated to this node [[.opendistro_security][0], node[0SeJnN8BS4WAcpyp-WRcxQ], [R], s[STARTED], a[id=raIjWQwnRGSLHZewDYro0g]]"
        },
        {
          "decider" : "awareness",
          "decision" : "NO",
          "explanation" : "there are too many copies of the shard allocated to nodes with attribute [k8_node_name], there are [40] total configured shard copies for this shard id and [7] total attribute values, expected the allocated shard count per attribute [7] to be less than or equal to the upper bound of the required number of shards per attribute [6]"
        }
  4. Manually try to fix the replica allocation: Check auto_expand_replicas setting:
    GET /.opendistro-anomaly-detector-jobs/_settings
    {
    ".opendistro-anomaly-detector-jobs" : {
    "settings" : {
      "index" : {
        "hidden" : "true",
        "number_of_shards" : "1",
        "auto_expand_replicas" : "1-20",
        "provided_name" : ".opendistro-anomaly-detector-jobs",
        "creation_date" : "1647463076484",
        "number_of_replicas" : "20",
        "uuid" : "yj6d84wzQZmN6z-UsVln7A",
        "version" : {
          "created" : "135238227"
        }
      }
    }
    }
    }

    Attempt to change the auto_expand_replicas setting:

    PUT /.opendistro-anomaly-detector-jobs/_settings
    {
    "settings" : {
    "index" : {
      "auto_expand_replicas" : "0-7"
    }
    }
    }

    returns

    {
    "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"
      }
    ],
    "type" : "security_exception",
    "reason" : "no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"
    },
    "status" : 403
    }

Expected behavior The "auto_expand_replicas" setting should only expand the replica count to the number of unique values configured in shard allocation awareness.

Additionally, I should be able to update the auto_expand_replicas setting for all indices as the admin user.

Plugins opensearch-alerting
opensearch-cross-cluster-replication
opensearch-knn
opensearch-performance-analyzer opensearch-sql opensearch-anomaly-detection
opensearch-index-management
opensearch-ml
opensearch-reports-scheduler
prometheus-exporter opensearch-asynchronous-search
opensearch-job-scheduler
opensearch-observability
opensearch-security

Screenshots n/a

Host/Environment (please complete the following information):

Additional context The forced yellow status is causing problems with our readiness checks and cluster state alerting.

Joelp commented 1 year ago

Same here with version 2.3.0. Using "awareness.attributes": "rack_id"

I solved this issue for index .opendistro_security with: securityadmin.sh -cd ../../../config/opensearch-security/ -dra -icl -nhnv -cacert ... and sh securityadmin.sh -cd ../../../config/opensearch-security/ -us 9 -icl -nhnv -cacert ...

But this doesn't work with index .opendistro-anomaly-detector-jobs.

Joelp commented 1 year ago

On every node restart, I must to do this workaround:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.awareness.attributes": null
  }
}

And after cluster is green:

PUT _cluster/settings
{
  "persistent" : {
    "cluster.routing.allocation.awareness.attributes": "rack_id"
  }
}