Open drewmiranda-gl opened 2 weeks ago
I'm also realizing that the behavior appears very similar to https://github.com/opensearch-project/OpenSearch/issues/1716 in terms of overriding all existing exclusion attributes. Not sure if its a regression or not though.
[Triage - attendees 1 2 3 4]
@drewmiranda-gl Thanks for filing. I believe the correct usage is to provide a comma separated string, e.g. "node1, node2, node3"
, not a list. The documentation does say "comma separated" but I believe it can be much more clear because this is indeed confusing. Would you be interested in contributing and update to the documentation website?
I did try both (list and comma separated), and the setting was accepted successfully using the comma separated, however, if more than one value is set it seems to disable the setting (as if it was never set). Also setting the setting removes all existing values which only allows you to exclude a single node at a time.
I will retest to be extra sure
Here is what i observe:
excluding a single node does work:
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch2",
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
1 146.1kb 10.3gb 87.6gb 97.9gb 10 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
45 718.5mb 10.9gb 87gb 97.9gb 11 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
BUT setting cluster.routing.allocation.exclude._name
to another value removes the existing value (this may be expected, i'm not sure)
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch4",
"cluster.routing.allocation.enable": "all"
}
}'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
45 642.6mb 10.9gb 86.9gb 97.9gb 11 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
0 0b 10.2gb 87.7gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
1 UNASSIGNED
Attempting to set both does not work:
curl --request PUT --header "Content-Type: application/json" http://localhost:9200/_cluster/settings --data '{
"transient": {
"cluster.routing.allocation.exclude._name": "opsrch2,opsrch4",
"cluster.routing.allocation.enable": "all"
}
}'
(same as the above allocation output)
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
22 299.3mb 11.5gb 86.3gb 97.9gb 11 192.168.0.176 192.168.0.176 opsrch1
45 718.5mb 11gb 86.9gb 97.9gb 11 192.168.0.179 192.168.0.179 opsrch2
22 288.8mb 10.7gb 87.2gb 97.9gb 10 192.168.0.177 192.168.0.177 opsrch3
0 0b 10.2gb 87.7gb 97.9gb 10 192.168.0.180 192.168.0.180 opsrch4
23 143.2mb 10.6gb 87.3gb 97.9gb 10 192.168.0.178 192.168.0.178 opsrch5
So it does appear opensearch ignores the setting if more than one value is present and behaves as if it is not set even though it is indeed set.
Let me know if you have any questions.
Describe the bug
When using
cluster.routing.allocation.exclude
, for examplecluster.routing.allocation.exclude._name
to exclude an OpenSearch node from allocating shards, it will only function if a single item is set. If more than one item is set the setting has no effect and all shards are rebalanced as if the setting is not set at all.I can verify the setting is set successful by viewing
_cluster/settings
but OpenSearch ignores this if it contains more than one value.Related component
Other
To Reproduce
Testing with 5 OpenSearch nodes:
There are many existing indices, all of which are balanced evenly across all nodes:
When i exclude
opsrch2
viaI do see all shards deallocate from this node. If I change
"opsrch2"
to a list, e.g.["opsrch2"]
, even if it has a single entry, the setting is completely ignored and shards are rebalanced across all nodes.Expected behavior
cluster.routing.allocation.exclude.
allows specifying more than a single node.Additional Details
Plugins Vanillia, out of box, default
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
I do have zone allocation awareness set and zone allocation forced.
zoneA: (odd numbered nodes) opsrch1, opsrch3, opsrch5 zoneB: (even numbered nodes) opsrch2, opsrch4