[BUG] Cluster status YELLOW after configuring Security Plugin in single-node clusters

williamtrelawny commented 1 year ago

What is the bug? Upon activating the Security plugin in a single node cluster, the cluster status will always be YELLOW because of unassigned shards in 3 different indices:

.plugins-ml-config
.opensearch-sap-pre-packaged-rules-config
.opensearch-sap-log-types-config

How can one reproduce the bug? Steps to reproduce the behavior:

Have single node cluster
Configure Security plugin per instructions in docs.
Observe Yellow cluster status after restarting
Query /_cat/shards API to see unassigned shards

What is the expected behavior? Default sharding for above indices should be 1 primary / 0 replicas to account for single-node clusters. Or perhaps some degree of intelligent sharding based on cluster size.

What is your host/environment?

OS: Debian 11.7
Version: Opensearch 2.9
Plugins: Security

Do you have any screenshots?

$ curl https://example.org:9200/_cat/shards?v -u admin
Enter host password for user 'admin':
index                                     shard prirep state
.plugins-ml-config                        0     p      STARTED
.plugins-ml-config                        0     r      UNASSIGNED
.opensearch-observability                 0     p      STARTED
.opensearch-sap-pre-packaged-rules-config 0     p      STARTED
.opensearch-sap-pre-packaged-rules-config 0     r      UNASSIGNED
.opensearch-sap-log-types-config          0     p      STARTED
.opensearch-sap-log-types-config          0     r      UNASSIGNED
.opendistro_security                      0     p      STARTED

Do you have any additional context? Related to the sentiment behind https://github.com/opensearch-project/anomaly-detection/issues/847, that plugins enabled on a single node Opensearch cluster should Just Work and maintain GREEN cluster status.

stephen-crawford commented 1 year ago

[Triage] Thank you for filing this issue @williamtrelawny. Looking at this issue, it seems like you have configured index configurations in such a way you have an unassigned shard. The Security plugin does not have any impact on sharding strategies.

Closing this issue.

stuartwakefield commented 1 year ago

I'm also having this issue with the default setup for running a single node in Docker. The instructions at https://hub.docker.com/r/opensearchproject/opensearch indicate that the following starts up a single node cluster:

$ docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" --name opensearch-node -d opensearchproject/opensearch:latest

However, the cluster remains in "yellow" status:

$ curl -sX GET "https://localhost:9200/_cluster/health" -ku admin:admin | jq -r .status
yellow

Many of the internal indices are being replicated but those replicas are not assigned to nodes, being a single node cluster.

$ curl https://localhost:9200/_cat/shards -ku admin:admin                       
.opensearch-observability                 0 p STARTED     0   208b 172.17.0.3 f9447c29352c
.plugins-ml-config                        0 p STARTED     1  3.8kb 172.17.0.3 f9447c29352c
.plugins-ml-config                        0 r UNASSIGNED                      
.opensearch-sap-pre-packaged-rules-config 0 p STARTED              172.17.0.3 f9447c29352c
.opensearch-sap-pre-packaged-rules-config 0 r UNASSIGNED                      
.opensearch-sap-log-types-config          0 p STARTED              172.17.0.3 f9447c29352c
.opensearch-sap-log-types-config          0 r UNASSIGNED                      
security-auditlog-2023.08.15              0 p STARTED     5 63.7kb 172.17.0.3 f9447c29352c
security-auditlog-2023.08.15              0 r UNASSIGNED                      
.opendistro_security                      0 p STARTED    10 74.8kb 172.17.0.3 f9447c29352c

My understanding of OpenSearch is insufficient for me to be able to configure these indices so that they are not replicated. Whilst I understand it may be possible to alter these indexes after the fact to use no replicas:

$ curl -XPUT https://localhost:9200/security-auditlog-2023.08.15/_settings -H 'Content-Type: application/json' -d '{"index":{"number_of_replicas":0}}' -ku admin:admin
{"acknowledged":true}

However, I have no idea what this will do to the cluster. This strategy also falls down when we try to modify the .plugins-ml-config index, in which I receive the following permissions related error:

{"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [] and User [name=admin, backend_roles=[admin], requestedTenant=null]"},"status":403}

Ideally, we are looking for a very minimal OpenSearch image we can run integration tests against in automated tests. It may be that I'm unaware of many of the configuration settings that can help, but I'm also struggling to find comprehensive documentation of these things.

Right now we are using an Elasticsearch image instead as a workaround.

williamtrelawny commented 1 year ago

[Triage] Thank you for filing this issue @williamtrelawny. Looking at this issue, it seems like you have configured index configurations in such a way you have an unassigned shard. The Security plugin does not have any impact on sharding strategies.

Closing this issue.

Please do not close this issue as it is not resolved. I have not made any changes to index parameters, sharding, replication, etc. at all. I have simply installed Openersearch and configured the Security plugin.

For whatever reason, 2 shards are created by default on all deployments of the Security Plugin, regardless of the number of nodes in the cluster.

If the Security Plugin does not affect sharding strategies, then why is it that other default indices not part of the Security Plugin do not have this issue?

Whether the root cause is within the Security Plugin code or not, the issue arises only after initializing the plugin, so from a procedural standpoint the issue does lie here.

And it definitely IS an issue if Opensearch w/ Security Plug-in does not work "out of the box" on a single node.

davidlago commented 1 year ago

If the Security Plugin does not affect sharding strategies, then why is it that other default indices not part of the Security Plugin do not have this issue?

I'm not sure I follow. The security plugin owns / creates index .opendistro_security, and based on the cluster health printouts above that one does not have unassigned shards. The ones who are suffering from this are others like .plugins-ml-config, opensearch-sap-pre-packaged-rules-config, opensearch-sap-log-types-config and security-auditlog-2023.08.15, none of which are owned by the security plugin.

If I'm understanding correctly, there is a state where the cluster is green and those indices (for example, .plugin-ml-config) are showing with no unallocated shards, and then after a step in the configuration of the security plugin, the problem starts and they start requiring additional shards to be allocated?

If that is the case, it would help a lot to get confirmation that that is the case (i.e. those are green to begin with) and then narrow down to a step in the security setup when these settings unexpectedly change.

todvora commented 1 year ago

Hello, I am having the very same issue as well. In my case it's also a single-node cluster (with discovery.type=single-node).

The problem and these indices are not coming from the security plugin but rather originate in opensearch-security-analytics and opensearch-ml plugins. When I remove these plugins, everything is running fine and the cluster status is green, because these 3 indices won't be created.

I think these two plugins should adapt the number_of_replicas when Opensearch is runing in single-instance mode.

Maybe we should move/reopen the issue in other repo(s)?

Thanks!

dennisoelkers commented 1 year ago

@scrawfor99: I think there was a misunderstanding when this issue was closed. The named indices are created with a replica configuration by default which makes it impossible to get them to green for a single-node setup. For anomaly detection, this was already acknowledged as requiring a change.

peternied commented 1 year ago

@dennisoelkers Thanks for calling this out - OpenSearch-Project wide we should have a consistent philosophy. We should re-triage this issue with this context in mind

peternied commented 1 year ago

Per the cluster health documentation [link] it would suggest that the security plugin should allow for a configuration with no replicas.

OpenSearch expresses cluster health in three colors: green, yellow, and red. A green status means all primary shards and their replicas are allocated to nodes. A yellow status means all primary shards are allocated to nodes, but some replicas aren’t. A red status means at least one primary shard is not allocated to any

Personally, I have a bias towards multi-node clusters as I have experience with single machine sources of failure causing significant impact. As much as I think all clusters should be multi-node, that is a preference and pushing that preference via the cluster health check is not transparent to operators of OpenSearch.

stephen-crawford commented 1 year ago

[Triage] Given the feedback on this issue, we will make an action item here to allow for 1 node clusters to be green. We will need to change it so that 1 node clusters can be set to 1 so that it is green.

samuelcostae commented 11 months ago

I will start looking into this. Can you assign it to me @scrawfor99 ?

cthtrifork commented 11 months ago

.opensearch-sap-log-types-config also breaks upgrading older versions of OpenSearch (<2.10) to latest OpenSearch on multi clusters. When one of the data nodes are updated (rolling upgrade), it will apply the new index to the node. However the index can not be replicated to the "old nodes" which have not been upgraded. This breaks the upgrade as the STATUS now is YELLOW forever.

We had to do this:

PUT /.opensearch-sap-log-types-config/_settings
{
  "index" : {
    "auto_expand_replicas" : "false",
    "number_of_replicas" : 0
  }
}

To let the status go to GREEN and continue the upgrades of all the data nodes

p.s setting auto_expand_replicas: "0-all" seems unnecessary/aggressive as a default setting?? https://github.com/opensearch-project/security-analytics/blob/main/src/main/java/org/opensearch/securityanalytics/logtype/LogTypeService.java#L448

nitinjagjivan commented 7 months ago

Is it fixed? Looks good on v2.12.0

LHozzan commented 7 months ago

We using 2.11.1 and I dont see the problem. I think, that problem was fixed in v 2.11.0.

godber commented 4 months ago

This problem appears to still exist in 2.12.0 and is not isolated to single node clusters, I am pretty sure we just saw this on a 20 node cluster when we expanded it to 40 nodes. We saw the following error message:

"explanation": "there are too many copies of the shard allocated to nodes with attribute [host.rack], there are [40] total configured shard copies for this shard id and [7] total attribute values, expected the allocated shard count per attribute [7] to be less than or equal to the upper bound of the required number of shards per attribute [6]"

So the problem is related to rack affinity for shard allocation.

I fixed it by following the advice above, but rather than disabling plugins and setting replicas to 0 I set it to the original number of replicas.

curl -XPUT -H 'Content-Type: application/json' http://es-foo.bar.lan/.opensearch-sap-log-types-config/_settings -d '{ "index" : {  "auto_expand_replicas" : "false", "number_of_replicas" : 20 } }'

opensearch-project / security

[BUG] Cluster status YELLOW after configuring Security Plugin in single-node clusters #3130