Closed omerlh closed 1 year ago
Hi @omerlh
The context deadline exceeded
makes me wonder if there's a connectivity issue with the server. What happens if you curl the seal status endpoint? e.g.
curl $VAULT_ADDR/v1/sys/seal-status
Does that succeed or hang? If it succeeds, what is the output?
I'm running it inside the pod, even the health check endpoint is failing...
I'm not an expert at kubernetes, so forgive me if I get the terminology wrong. I'm just trying to see if the server is misconfigured somehow, if it's at all responsive to HTTP traffic of any form. I suspect this is a configuration issue and not a legit bug but I can't tell from your config file alone. Are you running this on IPv6?
Look like some sort of timeout because now I am seeing a lot of those errors:
2020-09-24T16:30:09.935Z [INFO] core: stored unseal keys supported, attempting fetch
2020-09-24T16:30:09.961Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2020-09-24T16:30:11.640Z [INFO] core.autoseal: seal configuration missing, but cannot check old path as core is sealed: seal_type=recovery
2020-09-24T16:30:14.624Z [INFO] core.autoseal: seal configuration missing, but cannot check old path as core is sealed: seal_type=recovery
2020-09-24T16:30:14.961Z [INFO] core: stored unseal keys supported, attempting fetch
2020-09-24T16:30:14.990Z [WARN] failed to unseal core: error="stored unseal keys are supported, but none were found"
2020-09-24T16:30:17.615Z [INFO] core.autoseal: seal configuration missing, but cannot check old path as core is sealed: seal_type=recovery
Which is a lot easier to debug :)
It might be the issue of tcp:8080
being blocked at the control plane so the webhook isn't functional. I encountered exactly the same symptom like yours on a private GKE cluster. This thread serves a very good explanation and has resolved my issue. See if it would help.
Thank you so much @pinglin, I was encountering the same issue and that fixed it.
Here's some additional indicators I saw:
vault status
would throw Error checking seal status: context deadline exceeded
vault operator init
would initialize the vault (you can see all the files on the backend get created), but you would never get the unseal keys/root token. I would get
Error initializing: context deadline exceeded
command terminated with exit code 2
I saw this problem too if running on EC2 in a private subnet with no outbound access - NAT disabled (not using Kubernetes, just AMI's on instances).
I was able to login, but could not run vault status, or other queries. Enabling NAT fixed the problem.
Is it possible to deploy Vault with no outbound access?
I have same issue,
deployed Vault via Helm Chart and AWS KMS Autounseal,
after deployment, login to one pod of vault and executed:
vault operator init
gave me message Error initializing: context deadline exceeded
Next execute of vault operator init
gave me message Vault is already initialized
And Vault start working, but didn't got root/master token.
Tried to execute first vault operator init > init.txt
But file stay empty.
Fixed issue with this command export VAULT_CLIENT_TIMEOUT=300s
Basically in k8s, vault initialization is very slow and default timeout of 60s not enough.
After Vault deployed by Helm Chart execute the following:
export VAULT_CLIENT_TIMEOUT=300s
vault status # Initialized: false, Sealed: true, RecoveryType: awskms
vault operator init # will print tokens
vault status # Initialized: true, Sealed: false, RecoveryType: shamir
Thanks @adv4000, it worked in my environment once I edited my command with your fix:
kubectl exec vault-0 -- '/bin/sh' '-c' 'export VAULT_CLIENT_TIMEOUT=500s && vault operator init -key-shares=1 -key-threshold=1 -format=json' > cluster-keys.json
To store the key in a local file in host filesystem.
These issues are almost always platform / infrastructure related. Hey @omerlh how did you progress and is this issue still applicable for you?
Maybe we'd want to get the related fix with VAULT_CLIENT_TIMEOUT
documented and close? @Glastis @adv4000 any ideas? - a PR seems to be in order but I'm not sure where within the Kuberentes sections.
If no update or documentation suggestions then I vote that this be closed.
I think I gave up on automating Vault first deployment, but it was pretty long time ago so I might be wrong about that :)
It may be worth adding a note to the K8S docs about slower systems and longer initialisation times where an increases timeout may be required.
export VAULT_CLIENT_TIMEOUT=500s
Closing as there are no further follow-ups.
Describe the bug I installed Vault using the official chart, using GCP KMS seal and GCS backend.
Pod started as expected:
But this is only thing that is written to log. Any operation with the API results in error:
Expected behavior Vault is starting Environment:
vault status
): 1.4.2vault version
):Vault server configuration file(s):