Open shwetathareja opened 1 year ago
I think a tool that edits cluster state is a good idea, but maybe forcing the value (or removing a value) of a setting via .yml is simpler? Is this possible today?
You probably don't want to have an option to start a node and ignore a "bad" setting because that should have never happened and something is definitely wrong.
but maybe forcing the value (or removing a value) of a setting via .yml...You probably don't want to have an option to start a node and ignore a "bad" setting because that should have never happened and something is definitely wrong.
@dblock I'm confused. Isn't "forcing a value (or removing it) via a .yml setting" the same thing as "starting a node and ignoring a bad setting"?
You can use the NodeToolCLI to remove an invalid setting:
But this only works if the cluster is shut down so you can't perform this action without causing downtime. Now, when you have an invalid cluster setting, the cluster is pretty much down anyway.
I don't think you can fix this by directly modifying a yml file. The persisted cluster settings are in some binary file. You need a tool like the one I pasted above to read the settings file, remove a setting and write back to disk. I never tried this, but based on my experience debugging this issue, I think this procedure would have to be done on every leader eligible node as when this happens, every node is cooked. You can't make this change on the leader node and expect it to be able to successfully publish to other nodes to pick up the change as they are all down as well.
I think having a default value to fall back to is probably better than being completely down, but I suspect there may be cases where that might not be desirable. If we want to go this route, we would have to look at each default very carefully. You also don't want a cluster to fall back to a default and silently succeeding. This can be very dangerous.
A slight variant of the above idea is to have safer defaults that suit each cluster. You can do that by saving the last fully validated cluster settings every time we update them. In other words, you keep Settings_(n-1) AND Settings_n. If Settingsn blows up, you load Settings(n-1). This is kind of like auto-rollback.
@austintlee : Thanks for pointing out that NodeToolCLI, I missed that it can fix invalidSettings.
I don't think you can fix this by directly modifying a yml file. The persisted cluster settings are in some binary file.
Right for a dynamic setting, the value stored in the cluster state takes more precedence over the yml file and hence changing the yml file doesn't help. It requires cleaning the cluster state stored on the disk. Like you said it requires downtime for the cluster. But in cases like this, the cluster might already be down.
You also don't want a cluster to fall back to a default and silently succeeding. This can be very dangerous.
Yes I definitely agree silently moving to default is bad. We can have warn logs or prevent any new cluster settings change until that setting is fixed.
The proposal for auto-rollback is interesting as today OpenSearch only stores the latest state.
Good discussion. I think we ought to prioritize getting uses unbroken, first. So even if it's a painful experience, some path is better than no path. Shall we just document nodecli? Have you been able to use it @austintlee?
In my local environment, I use Docker and I could not find a way to bring down a node without killing the container running the node so, I could not try out NodeToolCli.
I can set up a single node cluster on my machine, but I think we would need to try out a more realistic scenario with multiple (leader eligible) nodes.
I would prefer a generic tool that could help not only cases of auto-rollback but something that can handle common problems like 1) disk going full(bootstrap fails if lock files cannot be written) 2) shard corruption where certain sectors on disk get corrupted impacting fewer shards.
In such cases the tool can help clear data safely, accept data loss and help the node join the cluster back
I tested NodeToolCli on my machine using 3 nodes.
1/ Launch a cluster with 3 nodes. 2/ Set the Search Backpressure mode to an invalid value. 3/ Confirm the cluster is in bad state. 4/ Kill all node processes. 5/ Remove the Search Backpressure mode for Node 1:
bin/opensearch-node remove-settings search_backpressure.mode
------------------------------------------------------------------------
WARNING: OpenSearch MUST be stopped before running this tool.
The following settings will be removed:
search_backpressure.mode: foo
------------------------------------------------------------------------
You should only run this tool if you have incompatible settings in the
cluster state that prevent the cluster from forming.
This tool can cause data loss and its use should be your last resort.
Do you want to proceed?
Confirm [y/N] y
Settings were successfully removed from the cluster state
6/ Bring up Node 1 and make sure it comes up fine. 7/ Bring up the rest of the nodes in the cluster.
Once you bring up the first node, it'll become the cluster manager so as long as this node has a valid ClusterState, the other nodes can be brought back online without performing the above step. They will all pick up the latest ClusterState from the cluster manager node.
@Bukhtawar if those are node-level issues, those capabilities can be added to NodeToolCli?
It would be nice if we had a management plane that is independent of the cluster state. I had to manually kill each process to produce the above result, but it would have been nice if this task could be accomplished from a client. Otherwise, in a production cluster, one would have to ssh into each host and bring each node down and back up.
Also, if the Security plugin is enabled, what permission is checked on updating cluster settings? I want to see how this action can be locked down, but I am not seeing anything related to this on this page - https://opensearch.org/docs/latest/security/access-control/permissions/.
If you are referring to an auto-rollback to previous cluster state, I must tell you it has consequences and might not be an easy decision always for instance, users can update mapping for fields and once updated as a part of the cluster state data will be indexed with the mapping that has been accepted in the new cluster state. Now if you decide on rolling back, the data that has been indexed so far might not stay consistent. This is just an an example but before building such a tool, we need to ensure it doesn't become a constraint in future development and most importantly doesn't leave the cluster in an inconsistent state. That's precisely what snapshots give you by ensuring you have granular but consistent restore checkpoint. There are plans on proving granular more granular restore points maybe a minute with the most recent remote store integration work. In short I would not be inclined to support auto-rollback of cluster state as a tooling or through a client controlled mechanism
I was referring to your comment:
" I would prefer a generic tool that could help not only cases of auto-rollback but something that can handle common problems like 1) disk going full(bootstrap fails if lock files cannot be written) 2) shard corruption where certain sectors on disk get corrupted impacting fewer shards. "
Can those capabilities (1 & 2) be added to NodeToolCli?
I agree with @Bukhtawar's reasoning on not to support auto-rollback for ClusterState. Once published and acknowledged by all nodes, new data can be ingested such as per the new ClusterState, which can render data inconsistent. This can be due to change in mappings, index/alias settings, and other objects inside ClusterState.
Instead, I would propose that we have more validations during Cluster State build (relevant Builder class) itself because criteria to rollback will be difficult to capture given the dependency on objects which can render the data inconsistent/corrupt like indices, aliases, mappings, etc. Also, maybe just going to (n-1)th ClusterState from nth may not help resolve the situation, if your last good-to-go ClusterState was older than (n-1)th state.
@austintlee i figured out how to use this command to work in an docker environment with no running container.
first you have to stop your container. Then you have to search on dockerhost opensearch-node with find / -iname opensearch-node The path for this binary is something with /var/lib/docker/overlay2/d6c1c5dfe1551aaa097d5a6383a41b75a070513a4d57befcba2ebde6fca54331/diff/usr/share/opensearch/bin/opensearch-node
if you have multiple docker-containers with opensearch on this node figure out which GraphDriver.Data matches to your container with docker inspect containername|containerid
cd into the opensearch directory
cd /var/lib/docker/overlay2/d6c1c5dfe1551aaa097d5a6383a41b75a070513a4d57befcba2ebde6fca54331/diff/usr/share/opensearch
if you manipulate the data path for opensearch, you have to link this to the correct path for the tool, unfortunately figured not out to tell the opensearch-node tool to pass a data path.
In my case i have to link all files under /data/opensearch/* to /var/lib/docker/overlay2/d6c1c5dfe1551aaa097d5a6383a41b75a070513a4d57befcba2ebde6fca54331/diff/usr/share/opensearch/data
ln -s /data/opensearch/* /var/lib/docker/overlay2/d6c1c5dfe1551aaa097d5a6383a41b75a070513a4d57befcba2ebde6fca54331/diff/usr/share/opensearch/data/
now you can execute the command from the path:
In my case my fault was: search_backpressure.cancellation_burst: "10"
correct is: search_backpressure.cancellation_burst: "10.0"
/var/lib/docker/overlay2/d6c1c5dfe1551aaa097d5a6383a41b75a070513a4d57befcba2ebde6fca54331/diff/usr/share/opensearch# bin/opensearch-node remove-settings search_backpressure.cancellation_burst
Now you can handle such an error.
For future i think it would be way better to handle wrong put requests correctly. Opensearch needs for this error-handling.
If a admin put a wrong setting into the cluster, opensearch has to handle this wrong setting with an error code, and leave the variable to its previous setting also for defaults.
Not every person in the world know how to fix such an error and for most of the admins it is like a worst case scenario, that the cluster broke only because a typo in the configuration.
Is your feature request related to a problem? Please describe.
In case, an invalid setting gets persisted in OpenSearch cluster state, it causes exception and prevents cluster from coming up. There is no way to salvage the cluster. The OpenSearch process would continue to crash and never come up. It happened recently when Search BackPressure mode setting didn't have proper validation for supported values. https://github.com/opensearch-project/OpenSearch/issues/6832
Describe the solution you'd like The proposal is to provide node scope setting or system property via JVM options to ignore specific setting parsing exceptions as the mitigation to avoid full cluster downtime. It would fall back to the settings default value. Also, this should allow updating the setting to a new valid value via _cluster/settings API.
Alternative solution Provide a tool which can fix the cluster state persisted on disk by first terminating the OpenSearch process. This could be more risky as any setting could be updated without any validation. But, we can evaluate if this would be a better solution to override any setting in the cluster state. This tool could help fix not just setting but any other part of the cluster state as well. I guess it would depend if OpenSearch users have faced situations where they needed a tool like this.