opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.83k stars 1.62k forks source link

[BUG] Setting `cluster.routing.allocation.exclude` only works if you specify a single value #13534

Open drewmiranda-gl opened 2 weeks ago

drewmiranda-gl commented 2 weeks ago

Describe the bug

When using cluster.routing.allocation.exclude, for example cluster.routing.allocation.exclude._name to exclude an OpenSearch node from allocating shards, it will only function if a single item is set. If more than one item is set the setting has no effect and all shards are rebalanced as if the setting is not set at all.

I can verify the setting is set successful by viewing _cluster/settings but OpenSearch ignores this if it contains more than one value.

Related component

Other

To Reproduce

Testing with 5 OpenSearch nodes:

ip            heap.percent ram.percent cpu load_1m load_5m load_15m node.role node.roles           cluster_manager name
192.168.0.176           15          59   6    0.21    0.16     0.12 dm        cluster_manager,data -               opsrch1
192.168.0.180           64          80   1    0.00    0.00     0.00 d         data                 -               opsrch4
192.168.0.178           43          66   1    0.00    0.01     0.00 d         data                 -               opsrch5
192.168.0.179           42          79   1    0.27    0.18     0.18 dm        cluster_manager,data -               opsrch2
192.168.0.177           45          70   3    0.13    0.15     0.11 dm        cluster_manager,data *               opsrch3

There are many existing indices, all of which are balanced evenly across all nodes:

shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    22      340.2mb    11.6gb     86.2gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
    23      357.8mb    10.6gb     87.2gb     97.9gb           10 192.168.0.179 192.168.0.179 opsrch2
    22      234.1mb    10.7gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
    22      349.2mb    10.6gb     87.3gb     97.9gb           10 192.168.0.180 192.168.0.180 opsrch4
    21      131.1mb    10.5gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5

When i exclude opsrch2 via

curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
  "transient": {
    "cluster.routing.allocation.exclude._name": "opsrch2",
    "cluster.routing.allocation.enable": "all"
  }
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    23      390.1mb    11.6gb     86.3gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
     1       99.8kb    10.2gb     87.6gb     97.9gb           10 192.168.0.179 192.168.0.179 opsrch2
    22        281mb    10.6gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
    45      695.7mb    10.8gb       87gb     97.9gb           11 192.168.0.180 192.168.0.180 opsrch4
    22      132.5mb    10.5gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5

I do see all shards deallocate from this node. If I change "opsrch2" to a list, e.g. ["opsrch2"], even if it has a single entry, the setting is completely ignored and shards are rebalanced across all nodes.

curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
  "transient": {
    "cluster.routing.allocation.exclude._name": ["opsrch2"],
    "cluster.routing.allocation.enable": "all"
  }
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    23      388.9mb    11.6gb     86.3gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
    23      307.5mb    10.6gb     87.3gb     97.9gb           10 192.168.0.179 192.168.0.179 opsrch2
    22      281.1mb    10.6gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
    24      351.5mb    10.5gb     87.4gb     97.9gb           10 192.168.0.180 192.168.0.180 opsrch4
    22      133.9mb    10.5gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5

Expected behavior

cluster.routing.allocation.exclude. allows specifying more than a single node.

Additional Details

Plugins Vanillia, out of box, default

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

I do have zone allocation awareness set and zone allocation forced.

zoneA: (odd numbered nodes) opsrch1, opsrch3, opsrch5 zoneB: (even numbered nodes) opsrch2, opsrch4

drewmiranda-gl commented 2 weeks ago

I'm also realizing that the behavior appears very similar to https://github.com/opensearch-project/OpenSearch/issues/1716 in terms of overriding all existing exclusion attributes. Not sure if its a regression or not though.

andrross commented 1 week ago

[Triage - attendees 1 2 3 4] @drewmiranda-gl Thanks for filing. I believe the correct usage is to provide a comma separated string, e.g. "node1, node2, node3", not a list. The documentation does say "comma separated" but I believe it can be much more clear because this is indeed confusing. Would you be interested in contributing and update to the documentation website?

drewmiranda-gl commented 1 week ago

I did try both (list and comma separated), and the setting was accepted successfully using the comma separated, however, if more than one value is set it seems to disable the setting (as if it was never set). Also setting the setting removes all existing values which only allows you to exclude a single node at a time.

I will retest to be extra sure

drewmiranda-gl commented 1 week ago

Here is what i observe:

excluding a single node does work:

curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
  "transient": {
    "cluster.routing.allocation.exclude._name": "opsrch2",
    "cluster.routing.allocation.enable": "all"
  }
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    22      299.3mb    11.5gb     86.3gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
     1      146.1kb    10.3gb     87.6gb     97.9gb           10 192.168.0.179 192.168.0.179 opsrch2
    22      288.8mb    10.7gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
    45      718.5mb    10.9gb       87gb     97.9gb           11 192.168.0.180 192.168.0.180 opsrch4
    23      143.2mb    10.6gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5

BUT setting cluster.routing.allocation.exclude._name to another value removes the existing value (this may be expected, i'm not sure)

curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
  "transient": {
    "cluster.routing.allocation.exclude._name": "opsrch4",
    "cluster.routing.allocation.enable": "all"
  }
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    22      299.3mb    11.5gb     86.3gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
    45      642.6mb    10.9gb     86.9gb     97.9gb           11 192.168.0.179 192.168.0.179 opsrch2
    22      288.8mb    10.7gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
     0           0b    10.2gb     87.7gb     97.9gb           10 192.168.0.180 192.168.0.180 opsrch4
    23      143.2mb    10.6gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5
     1                                                                                       UNASSIGNED

Attempting to set both does not work:

curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
  "transient": {
    "cluster.routing.allocation.exclude._name": "opsrch2,opsrch4",
    "cluster.routing.allocation.enable": "all"
  }
}'

(same as the above allocation output)

shards disk.indices disk.used disk.avail disk.total disk.percent host          ip            node
    22      299.3mb    11.5gb     86.3gb     97.9gb           11 192.168.0.176 192.168.0.176 opsrch1
    45      718.5mb      11gb     86.9gb     97.9gb           11 192.168.0.179 192.168.0.179 opsrch2
    22      288.8mb    10.7gb     87.2gb     97.9gb           10 192.168.0.177 192.168.0.177 opsrch3
     0           0b    10.2gb     87.7gb     97.9gb           10 192.168.0.180 192.168.0.180 opsrch4
    23      143.2mb    10.6gb     87.3gb     97.9gb           10 192.168.0.178 192.168.0.178 opsrch5

So it does appear opensearch ignores the setting if more than one value is present and behaves as if it is not set even though it is indeed set.

Let me know if you have any questions.