Closed shujun10086 closed 2 years ago
Vault uses the presence of the keyring to test whether it has been initialized. So, mysterious loss of the keyring file from backing storage would seem to explain this behaviour.
I suspect when Vault rotate the keyring file, it stuck and kill by liveness probe, the file will be gone as it maybe delete during rotating. From source code, It will rotate it in a stable time. If the time can be configured?
Another question is why http://127.0.0.1:5817/v1/sys/init return success as Vault is already init. Does it not use the keyring file to test if it init or not ?
If Vault can make sure the keyring rotate successfully when it handle the kill 15 signal ?
Is there a way to generate the file manually becoz we have faced it in our prod environment?
Is there a way to generate the file manually becoz we have faced it in our prod environment?
It seems it cannot. The file is updated every 5 minutes by default. And it seems even I save the file and do manual replacing, the unseal still cannot be successful. Which version of vault do you use?
vault - 1.7.1. Any other way to unseal the vault now?
Question was also asked at https://discuss.hashicorp.com/t/keyring-file-is-missing-under-core-directory/43588 . My answer from there:
There is no way - the keyring file contained the encryption keys with which all the user data in Vault is encrypted.
With it lost, unless you have your own backup elsewhere, all the data is permanently lost.
Describe the bug The Vault cannot be unsealled anymore after restart. Vault server is restarted due to liveness probe fail Use curl -m 1 http://127.0.0.1:5817/v1/sys/health to check Vault health. The liveness probe internal is 5s and 3 times.
After it restarted, our own programe will unseal it by the restored unseal key and root token file.
It will check Vault init status by http://127.0.0.1:5817/v1/sys/init, Then the response is Vault init true. But it cannot unseal Vault anymore. URL: PUT http://127.0.0.1:5817/v1/sys/unseal Code: 400. Errors:
Check the Vault DB files. It seems _keyring file is not exist. The following is the Vault DB core files. usually there will be a _keyring file. When problem happend, It seems gone. But not sure if it is just a result as Vault restart.
bash-5.1$ ls -l /mnt/services/vault/DB/core/ total 6 -rw-------. 1 9999 9999 397 May 24 09:31 _audit -rw-------. 1 9999 9999 537 May 24 09:31 _auth -rw-------. 1 9999 9999 133 May 24 09:31 _local-audit -rw-------. 1 9999 9999 133 May 24 09:31 _local-auth -rw-------. 1 9999 9999 417 May 24 09:31 _local-mounts -rw-------. 1 9999 9999 209 May 29 20:51 _master -rw-------. 1 9999 9999 709 May 24 09:31 _mounts -rw-------. 1 9999 9999 169 May 24 09:31 _seal-config -rw-------. 1 9999 9999 101 May 24 09:31 _shamir-kek drwx------. 3 9999 9999 2 May 24 09:31 cluster drwx------. 2 9999 9999 1 May 24 09:31 hsm drwx------. 2 9999 9999 1 May 24 09:31 wrapping
To Reproduce Steps to reproduce the behavior:
Still not clear why VaultServer does not response the health request. So it is difficult to reproduce
Expected behavior After Vault restart, it still can be unsealled normally.
Environment:
Vault Server Version (retrieve with
vault status
): *bash-5.1$ vault status Key ValueSeal Type shamir Initialized true Sealed true Total Shares 1 Threshold 1 Unseal Progress 0/1 Unseal Nonce n/a Version 1.8.10 Storage Type file HA Enabled false
Vault CLI Version (retrieve with
vault version
): bash-5.1$ vault version Vault v1.8.10 (cgo)Server Operating System/Architecture: fedora
Vault server configuration file(s):
bash-5.1$ cat /etc/vaultserver/server.hcl storage "file" { path = "/mnt/services/vault/DB" }
listener "tcp" { address = "127.0.0.1:5817" tls_disable = 1 }
cache_size = 100 disable_mlock = true
Additional context Add any other context about the problem here. If you can explain from the Vault source code point of view, why the "Vault is not initialized" when Vault is inited already.