Closed tgross closed 1 day ago
The specific error we're getting here is when the server we're replicating the key from tries to get the key material from its keyring. That key material isn't present anymore so the replication can't work anymore. That's not an unexpected scenario by itself, because we have to handle that for when we want to bootstrap the keyring from one server to all the other servers (and some servers may get replication requests for keys they don't yet have).
But for what is effectively an "orphaned" key, we're in a messy spot. We can't guarantee that the key is safe to remove from the metadata, because the operator may have had a bad recovery process and needs to restore the on-disk keyring to the servers. As a workaround, the operator can remove the key via nomad operator root keyring remove
if they know it's truly orphaned. But being able to fix https://github.com/hashicorp/nomad/issues/19368 seems important to figure out to fix this issue.
I've done some testing and I believe this will be resolved by the work done in https://github.com/hashicorp/nomad/pull/23577. I'm going to close this issue out.
In https://github.com/hashicorp/nomad/issues/19340 @sbihel reported a behavior where the followers would try to replicate keys that had been previously rotated out, and this would fail:
19340 covered another critical bug and was automatically closed once the fix was merged. This issue is a follow-up.