Closed knanao closed 8 months ago
Hi @knanao! Deleting the client datadir means that the client no longer has its node secret and has to recreate it from scratch. but the node ID potentially comes from a source of data on the host (a hash of the host ID), which means it'll get the same node ID but have a different node secret. So that leads to permissions errors.
You could get away with this prior to 1.6.0 because we didn't enforce the node secret as strongly as we should have. (See https://github.com/hashicorp/nomad/pull/16799)
Note that if you find yourself in this spot, you can purge the node via: https://developer.hashicorp.com/nomad/api-docs/nodes#purge-node That'll make the server forget about the client node and then the client node can be restarted and safely rejoin with its new secret.
Nomad version
Nomad v1.6.8 or above
Issue
Before v1.5.15, restoring snapshot keeps current nomad Clients joined into the cluster. This means that All of the clients at the time the snapshot was taken and clients currently exist in the cluster are ready status temporally. After
failover_heartbeat_ttl(default 5m)
time, old clients are down state, and, jobs are reallocated to new clients. However, all new clients are removed and won't be reregistered automatically to the cluster when executing restoring snapshot in v1.6.8. In the result, all jobs were pending after old clients were down. This can be resolved by restarting clients, but unbalanced allocations to clients is inevitable. I couldn't verify it with all, but in v1.6.8 and v1.7.5 at least.Reproduction steps
data_directory
file each nodes..nks.json
file underdata_directory/server/keystore
for only one server which is started at first.Expected Result
The new nomad clients automatically join to the fresh cluster after restoring the snapshot, and all allocations will be rescheduled in to them.
Actual Result
The new nomad clients don't join the cluster, and the allocs are never rescheduled without restarting nomad clients.
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)