Open laugmanuel opened 3 years ago
Hi @laugmanuel - were you testing your snapshot restores previously? In #12388, the changes were made to expose broken seals that are resulting in unusable snapshots. Prior to the changes, the snapshot creation would appear to be successful, but the snapshots could not be restored. If you could let us know, I'd appreciate it. :)
Hi @hsimon-hashicorp , yes we did test the restores previously and they were successful. However, I do not remember if this was tested with snapshots created manually/automatism shortly after unsealing the Vault or by the scheduled backup. I can try to reproduce this with a Vault version prior to the mentioned change and report back.
Nevertheless, the other points regarding docs and serving a broken backup through API and UI are still valid 😉
When a snapshot is initiated via the API, a success is returned immediately upon the snapshot starting to stream. The snapshot is not buffered on the server, because the size of the snapshot is unknown. So, the snapshot API request returns a "success", starts to stream, and then if at some point the seal isn't available, the snapshot will be broken. This is why testing restores is a critical part of any backup process. Additionally, https://github.com/hashicorp/vault/pull/13078 may help with this, to make detecting seal issues easier and faster. Let me know if this answers your questions about the API. I'll ask @taoism4504 for assistance re: docs.
I've tested with Vault 1.8.5 and Vault 1.7.4 (which does, according to the Changelog, not contain the above fix). In both cases, the snapshot was valid and restorable with a valid token and became broken after the token expired. So I guess, the backups were broken with earlier versions after all.
For us, I fixed it temporarily by issuing a token with a relatively long lifetime (based on an approle which overrides the default ttl of 32d).
I will experiment with periodic tokens for transit because the transit seal provider seems to have a refresh feature (disable_renewal = "false"
) for the token?! https://www.vaultproject.io/docs/configuration/seal/transit#disable_renewal
Hi @taoism4504 - we were discussing this today - this might be good to clarify and expand in the snapshot and restore documentation with regards to token longevity and not breaking snapshots. :)
Hi @hsimon-hashicorp , what's the status on this?
Using periodic tokens together with disable_renewal = "false"
works fine for me; so does using a token with very long TTL. Just wondering if docs will be modified - otherwise we can close this.
We've had this problem happen today, the token in the config for the autounseal had expired. We renewed the token, updated the config, reloaded vault (using kill -HUP), but the snapshot still failed with the same error until we actually restarted all our nodes. If the transit token not reloaded on SIGHUP?
Pinging @schavis for docs update. Thanks @laugmanuel!
Pinging @schavis for docs update. Thanks @laugmanuel!
Whats the status here?
Describe the bug We use Raft as our storage backend. We also do use transit sealing against a secondary Vault instance to provide auto unsealing for our primary Vault installed in Kubernetes. The token we use for that gets created by an init-container and is only valid for a few minutes. Until recently, this setup worked fine for us. The pods got unsealed automatically and the backups were present and valid (could be successfully restored).
Probably due to https://github.com/hashicorp/vault/pull/12388, this behaviour changed! Creating a backup using
vault operator raft snapshot save <snapshot file>
results in an error regarding theSHA256SUMS.sealed
file. Using the API endpoint, we can successfully download the snapshot without any error. In both cases the snapshot file gets created and looks to contain data:file <snapshot file>
the backup is recognized asgzip compressed data
However, the backup can not be restored and Vault complains about
Load error
in the UI. Restoring using the CLI also fails. If I try to unpack the backup using gzip, I getunexpected end of file
-> it looks like the backup file is corrupted.If I extend the lifetime of the unseal token, the backup gets created and can be restored successfully! There is no word in the docs, that the transit token used in the Vault config/env variables must still be valid for a backup to succeed!
To Reproduce Steps to reproduce the behavior:
Expected behavior Either a valid backup file (a file that can be extracted using gzip+tar and restored) should be created; even though there is the warning about
SHA256SUMS.sealed
file. OR the creation of the backup should hard fail without any file being created.If someone uses the API to create the backup but does not regularly check the restore, there would be no way to see, that the backup file is corrupted.
Also, the docs about raft snapshotting should mention, that the seal-configuration (including the token) must be valid for the backup to fully work.
Environment:
vault status
): 1.8.3, 1.8.4vault version
): 1.8.3, 1.8.4Vault server configuration file(s):
Additional context There must be a notice in the docs about the token used for transit. The docs and also the howto guides only mention to create a new token and to put it in the config/env variable. This would also break after the default lifetime of 32d: