Open kralicky opened 1 year ago
@kralicky what actions were taken to put the cluster in this state?
Caused by: org.opensearch.transport.RemoteTransportException: [opni-data-0][10.0.148.66:9300][internal:cluster/coordination/join/validate]
Caused by: org.opensearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid MrjDcoJPRQSAtKxTyeH8aQ than local cluster uuid 9C2sVJCdRueVQSnG81W-qQ, rejecting
at org.opensearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:219) ~[opensearch-2.4.0.jar:2.4.0]
This error indicates a split brain has occurred. This can happen with the master data changes? Are you using local path provisioner as the storage class? This could potentially cause this problem if master nodes are restarting when they have a minority
Looks like uninstalling logging, and then reinstalling it reproduces this
This is currently expected behaviour if the pvcs aren't removed after uninstall. There are a couple of possible approaches; we could provide an option to remove pvcs on uninstall. We could also look at not bootstrapping a new cluster if there are existing pvcs but this is potentially brittle.
As a user I would expect that if I uninstall but don't delete persistent data, that I can then reinstall with the existing data. Conversely if I click the option to delete persistent data, it should actually delete the data.
For capabilities yes. For actually uninstalling a backend I think there's a different set of expectations.
Would this change if log data was stored in s3 instead of locally?
Installed logging backend with the following settings:
The
opni-data-0
pod is stuck in the unready status. This error is repeated in the pod logs:Additionally, the dashboards pod is stuck restarting and has the following repeated logs: