Closed uozalp closed 1 month ago
Hi @uozalp,
Thanks for creating an issue. Could you please help me reproduce this by describing your setup a little more? Are you using the Pinniped Supervisor? What kind of identity provider are you using? What kind of Kubernetes cluster are you using? If you are using the Supervisor, are you using web-based authentication or username/password (CLI prompts, no browser) authentication?
Hi @cfryanr
Thank you for the prompt response. Here are the details about our setup:
If you need any more information or specific configurations, please let me know.
Hi @uozalp,
I haven't been able to reproduce this yet. Can you try the following?
rm -rf ~/.config/pinniped/
PINNIPED_DEBUG=true k9s --kubeconfig ./your-pinniped-kubeconfig.yaml 2> stderr.log
watch -n1 cat stderr.log
watch -n1 cat ~/.config/pinniped/sessions.yaml ~/.config/pinniped/credentials.yaml
expirationTimestamp
of your mTLS client credential in the CredentialCache
shown in your third terminal.tokens.refresh.token
.expirationTimestamp
time. Around that time, what happens? Do you get any interesting log messages in the second terminal window?By the way, here is my setup that I used to try to reproduce. There are obviously some major differences from your setup, but some important similarities as well.
# cd to the cloned pinniped git repo
cd pinniped
# build and run Pinniped in a kind cluster
PINNIPED_USE_CONTOUR=1 ./hack/prepare-for-integration-tests.sh -c
# configure a JWTAuthenticator without using the Supervisor, using Dex running in the cluster as an OIDC provider
./hack/prepare-jwtauthenticator-on-kind.sh
# create a role binding for the test user
kubectl create clusterrolebinding pinny-can-read --clusterrole view --user 'pinny@example.com'
# use k9s and login as the test user with the username/password that were previously printed to the screen
PINNIPED_DEBUG=true k9s --kubeconfig ./kubeconfig-jwtauthenticator.yaml 2> stderr.log
I wonder how frequently k9s queries the K8s API (this would shorten the time between when mTLS certs expire and when they are refreshed).
I also wonder how long it takes to perform a regular token refresh (perhaps without k9s). It shouldn't take more than just a few seconds to perform the refresh. This could be checked by performing a login through Pinniped, waiting 15 minutes (just to be sure all credentials except the refresh token have expired), and then performing some K8s action (capture the time it takes for this last step). Is that longer than performing a similar action with an admin kubeconfig?
Unfortunately HashiCorp Vault does not support refresh tokens yet.
Here is the log for three attempts where K9s went into Dial k8s Toast mode.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:18:57Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:18:54 CEST pinniped-login cmd/login_oidc.go:255 using cached cluster credential.
Thu, 13 Jun 2024 10:18:54 CEST pinniped-login cmd/login_oidc.go:255 using cached cluster credential.
Thu, 13 Jun 2024 10:19:07 CEST pinniped-login cmd/login_oidc.go:260 Performing OIDC login {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:19:07 CEST oidcclient/login.go:399 Pinniped: Found unexpired cached token. {"type": "id_token"}
Thu, 13 Jun 2024 10:19:07 CEST pinniped-login cmd/login_oidc.go:270 Exchanging token for cluster credential {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:19:08 CEST pinniped-login cmd/login_oidc.go:278 Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:19:08 CEST pinniped-login cmd/login_oidc.go:285 caching cluster credential for future use.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:24:11Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:24:08 CEST pinniped-login cmd/login_oidc.go:255 using cached cluster credential.
Thu, 13 Jun 2024 10:24:08 CEST pinniped-login cmd/login_oidc.go:255 using cached cluster credential.
Thu, 13 Jun 2024 10:24:21 CEST pinniped-login cmd/login_oidc.go:260 Performing OIDC login {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:24:21 CEST oidcclient/login.go:399 Pinniped: Found unexpired cached token. {"type": "id_token"}
Thu, 13 Jun 2024 10:24:21 CEST pinniped-login cmd/login_oidc.go:270 Exchanging token for cluster credential {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:24:21 CEST pinniped-login cmd/login_oidc.go:278 Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:24:21 CEST pinniped-login cmd/login_oidc.go:285 caching cluster credential for future use.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:29:25Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:29:22 CEST pinniped-login cmd/login_oidc.go:255 using cached cluster credential.
Thu, 13 Jun 2024 10:29:36 CEST pinniped-login cmd/login_oidc.go:260 Performing OIDC login {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:29:36 CEST oidcclient/login.go:399 Pinniped: Found unexpired cached token. {"type": "id_token"}
Thu, 13 Jun 2024 10:29:36 CEST pinniped-login cmd/login_oidc.go:270 Exchanging token for cluster credential {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:29:36 CEST pinniped-login cmd/login_oidc.go:278 Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:29:36 CEST pinniped-login cmd/login_oidc.go:285 caching cluster credential for future use.
This is from a regular renewal without issues.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:47:02Z"
Thu, 13 Jun 2024 10:47:03 CEST pinniped-login cmd/login_oidc.go:260 Performing OIDC login {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:47:03 CEST oidcclient/login.go:399 Pinniped: Found unexpired cached token. {"type": "id_token"}
Thu, 13 Jun 2024 10:47:03 CEST pinniped-login cmd/login_oidc.go:270 Exchanging token for cluster credential {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:47:04 CEST pinniped-login cmd/login_oidc.go:278 Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:47:04 CEST pinniped-login cmd/login_oidc.go:285 caching cluster credential for future use.
Another observation is that if you are in K9s and looking at the logs from a pod, the stream is interrupted when the mTLS certificate expires.
cat ~/.config/pinniped/credentials.yaml
apiVersion: config.supervisor.pinniped.dev/v1alpha1
credentials:
- creationTimestamp: "2024-06-13T07:46:07Z"
credential:
clientCertificateData: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
clientKeyData: |
-----BEGIN PRIVATE KEY-----
-----END PRIVATE KEY-----
expirationTimestamp: "2024-06-13T07:51:07Z"
key: e8571687627a8ad811771615a815627264bfb85515ca7208ef5f6eb2aba5b4ab
lastUsedTimestamp: "2024-06-13T07:49:24Z"
kind: CredentialCache
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 09:51:10 CEST pinniped-login cmd/login_oidc.go:260 Performing OIDC login {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 09:51:10 CEST oidcclient/login.go:399 Pinniped: Found unexpired cached token. {"type": "id_token"}
Thu, 13 Jun 2024 09:51:10 CEST pinniped-login cmd/login_oidc.go:270 Exchanging token for cluster credential {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 09:51:10 CEST pinniped-login cmd/login_oidc.go:278 Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 09:51:10 CEST pinniped-login cmd/login_oidc.go:285 caching cluster credential for future use.
Pinniped renews the mTLS certificate after it expires, which can lead to brief disconnections. A potential solution is to cache two certificates in ~/.config/pinniped/credentials.yaml
with a 1-2 minute overlap. This way, the new certificate would already be issued minutes before the existing one expires, ensuring a seamless transition and eliminating potential connection interruptions.
Unfortunately HashiCorp Vault does not support refresh tokens yet.
That is unfortunate. However, your CLI debug logs above always show Found unexpired cached token. {"type": "id_token"}
which means that you are probably getting back an ID token with a long lifetime. The expiration of your ID token should be shown in your sessions.yaml file, so you can check its lifetime if you're curious. This should also work, because Pinniped will continue to use the ID token until it expires. It will send the ID token to the Concierge's TokenCredentialRequest endpoint, which will immediately return a new mTLS client cert. You can see this in your CLI debug logs where it talks about exchanging the token for a cluster credential. You can see that it is all happening at the same timestamp, so it is taking less than a second (probably a lot less).
I'm not able to reproduce the slowness which leads to the "dial k8s toast" errors from k9s. Even if I disable refresh tokens in my OIDC provider, I still never see that until my ID token expires and I need to log in again in my browser (k9s is understandably upset until I finish the login, and then it is happy again). I'm not sure yet how you're getting those dial errors from k9s while your ID token is still valid. Could you try again and pay attention to your ID token expiration time? Are you seeing those errors while your ID token is not yet expired?
I can reproduce the log stream ending in k9s when the mTLS token expires. This must be a side-effect of how k9s implements log streaming. It does not happen if I use kubectl logs --follow
. That command will tail my logs forever regardless of mTLS certs expiring in the middle of the log tailing.
Thank you for the detailed feedback.
Due to limitations with our current setup, we have configured our ID and access tokens to 8 hours to prevent the need for frequent logins. Ideally, we would like to use short-lived tokens with an 8-hour refresh token, but unfortunately, HashiCorp Vault does not support this yet (https://github.com/hashicorp/vault/issues/16134).
These "dial k8s toast" errors occur while the ID and access tokens are still valid.
I will investigate potential network latency issues with our network team tomorrow.
I agree that the log stream ending when the mTLS token expires seems to be related to how k9s
handles log streaming. While kubectl logs --follow
works.
Hi @cfryanr
We resolved the problem. Our security team had disabled outbound NTP in the firewall, and the time on two of our worker nodes had drifted by 4-5 seconds.
Revisiting the previous logs, this now makes sense. There is a gap of a few seconds between the certificate expiry time and the first line of the debug log.
It might still be beneficial to make the duration of the mTLS certificate configurable ?
Thanks for your help.
Ah very interesting! I'm glad that you were able to discover the root cause.
I'm going to close this issue for now because things are working as expected. The short lifetime of the credentials is part of the security assurances that Pinniped provides, so we would rather not make it configurable unless there is a compelling use case. Your situation was solved, so it doesn't feel like a compelling use case for making the lifetime configurable (yet).
Is your feature request related to a problem? Please describe. We use K9s for all our administration tasks. Every 5 minutes, K9s loses connection to our cluster. We have determined that this issue is due to our client certificate credential expiring. After 10-30 seconds, a new 10-minute certificate is issued (valid from 5 minutes in the past), and the K9s connection is restored.
We noticed that the 5-minute expiration time is hardcoded in the source code: Link to Source Code.
Describe the solution you'd like We would like this expiration time to be configurable, allowing users to set a custom expiration duration according to their needs.
Describe alternatives you've considered We have not found any alternative solutions. Increasing the default value is not desirable due to security concerns.
Additional context
Just to clarify, we do not see the issue every 5 minutes. However, if you navigate around in the K9s interface around the time the client certificate expires, you will most likely see the issue.