vmware-tanzu / pinniped

Pinniped is the easy, secure way to log in to your Kubernetes clusters.
https://pinniped.dev
Apache License 2.0
527 stars 64 forks source link

Feature Request: Configurable Expiration Time for Client Certificate #1987

Closed uozalp closed 1 month ago

uozalp commented 1 month ago

Is your feature request related to a problem? Please describe. We use K9s for all our administration tasks. Every 5 minutes, K9s loses connection to our cluster. We have determined that this issue is due to our client certificate credential expiring. After 10-30 seconds, a new 10-minute certificate is issued (valid from 5 minutes in the past), and the K9s connection is restored.

apiVersion: config.supervisor.pinniped.dev/v1alpha1
credentials:
- creationTimestamp: "2024-06-12T11:34:38Z"
  credential:
    clientCertificateData: |
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----
    clientKeyData: |
      -----BEGIN PRIVATE KEY-----
      -----END PRIVATE KEY-----
    expirationTimestamp: "2024-06-12T11:39:41Z"
  key: 0c599d95e49704e...
  lastUsedTimestamp: "2024-06-12T11:34:38Z"
kind: CredentialCache

We noticed that the 5-minute expiration time is hardcoded in the source code: Link to Source Code.

Describe the solution you'd like We would like this expiration time to be configurable, allowing users to set a custom expiration duration according to their needs.

Describe alternatives you've considered We have not found any alternative solutions. Increasing the default value is not desirable due to security concerns.

k9s

Additional context

Just to clarify, we do not see the issue every 5 minutes. However, if you navigate around in the K9s interface around the time the client certificate expires, you will most likely see the issue.

cfryanr commented 1 month ago

Hi @uozalp,

Thanks for creating an issue. Could you please help me reproduce this by describing your setup a little more? Are you using the Pinniped Supervisor? What kind of identity provider are you using? What kind of Kubernetes cluster are you using? If you are using the Supervisor, are you using web-based authentication or username/password (CLI prompts, no browser) authentication?

uozalp commented 1 month ago

Hi @cfryanr

Thank you for the prompt response. Here are the details about our setup:

If you need any more information or specific configurations, please let me know.

cfryanr commented 1 month ago

Hi @uozalp,

I haven't been able to reproduce this yet. Can you try the following?

  1. Perform a client-side logout by deleting the Pinniped session cache files: rm -rf ~/.config/pinniped/
  2. When you start k9s, enable the debug env var and send all stderr to a file: PINNIPED_DEBUG=true k9s --kubeconfig ./your-pinniped-kubeconfig.yaml 2> stderr.log
  3. In a second terminal window, watch the content of that file so you can see what is happening: watch -n1 cat stderr.log
  4. Finish your login so that k9s can start working.
  5. In a third terminal window, watch the content of the Pinniped session cache files: watch -n1 cat ~/.config/pinniped/sessions.yaml ~/.config/pinniped/credentials.yaml
  6. Look at the expirationTimestamp of your mTLS client credential in the CredentialCache shown in your third terminal.
  7. In your third terminal, do you have a refresh token showing from your sessions.yaml file? It would be nested under tokens.refresh.token.
  8. Keep navigating and doing things in k9s until your mTLS expirationTimestamp time. Around that time, what happens? Do you get any interesting log messages in the second terminal window?
cfryanr commented 1 month ago

By the way, here is my setup that I used to try to reproduce. There are obviously some major differences from your setup, but some important similarities as well.

# cd to the cloned pinniped git repo
cd pinniped
# build and run Pinniped in a kind cluster
PINNIPED_USE_CONTOUR=1 ./hack/prepare-for-integration-tests.sh -c
# configure a JWTAuthenticator without using the Supervisor, using Dex running in the cluster as an OIDC provider
./hack/prepare-jwtauthenticator-on-kind.sh
# create a role binding for the test user
kubectl create clusterrolebinding pinny-can-read --clusterrole view --user 'pinny@example.com'
# use k9s and login as the test user with the username/password that were previously printed to the screen
PINNIPED_DEBUG=true k9s --kubeconfig ./kubeconfig-jwtauthenticator.yaml 2> stderr.log
joshuatcasey commented 1 month ago

I wonder how frequently k9s queries the K8s API (this would shorten the time between when mTLS certs expire and when they are refreshed).

I also wonder how long it takes to perform a regular token refresh (perhaps without k9s). It shouldn't take more than just a few seconds to perform the refresh. This could be checked by performing a login through Pinniped, waiting 15 minutes (just to be sure all credentials except the refresh token have expired), and then performing some K8s action (capture the time it takes for this last step). Is that longer than performing a similar action with an admin kubeconfig?

uozalp commented 1 month ago

Unfortunately HashiCorp Vault does not support refresh tokens yet.

Here is the log for three attempts where K9s went into Dial k8s Toast mode.

cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:18:57Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:18:54 CEST  pinniped-login  cmd/login_oidc.go:255  using cached cluster credential.
Thu, 13 Jun 2024 10:18:54 CEST  pinniped-login  cmd/login_oidc.go:255  using cached cluster credential.
Thu, 13 Jun 2024 10:19:07 CEST  pinniped-login  cmd/login_oidc.go:260  Performing OIDC login  {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:19:07 CEST  oidcclient/login.go:399  Pinniped: Found unexpired cached token.  {"type": "id_token"}
Thu, 13 Jun 2024 10:19:07 CEST  pinniped-login  cmd/login_oidc.go:270  Exchanging token for cluster credential  {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:19:08 CEST  pinniped-login  cmd/login_oidc.go:278  Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:19:08 CEST  pinniped-login  cmd/login_oidc.go:285  caching cluster credential for future use.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:24:11Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:24:08 CEST  pinniped-login  cmd/login_oidc.go:255  using cached cluster credential.
Thu, 13 Jun 2024 10:24:08 CEST  pinniped-login  cmd/login_oidc.go:255  using cached cluster credential.
Thu, 13 Jun 2024 10:24:21 CEST  pinniped-login  cmd/login_oidc.go:260  Performing OIDC login  {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:24:21 CEST  oidcclient/login.go:399  Pinniped: Found unexpired cached token.  {"type": "id_token"}
Thu, 13 Jun 2024 10:24:21 CEST  pinniped-login  cmd/login_oidc.go:270  Exchanging token for cluster credential  {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:24:21 CEST  pinniped-login  cmd/login_oidc.go:278  Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:24:21 CEST  pinniped-login  cmd/login_oidc.go:285  caching cluster credential for future use.
cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:29:25Z"
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 10:29:22 CEST  pinniped-login  cmd/login_oidc.go:255  using cached cluster credential.
Thu, 13 Jun 2024 10:29:36 CEST  pinniped-login  cmd/login_oidc.go:260  Performing OIDC login  {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:29:36 CEST  oidcclient/login.go:399  Pinniped: Found unexpired cached token.  {"type": "id_token"}
Thu, 13 Jun 2024 10:29:36 CEST  pinniped-login  cmd/login_oidc.go:270  Exchanging token for cluster credential  {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:29:36 CEST  pinniped-login  cmd/login_oidc.go:278  Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:29:36 CEST  pinniped-login  cmd/login_oidc.go:285  caching cluster credential for future use.

This is from a regular renewal without issues.

cat ~/.config/pinniped/credentials.yaml
expirationTimestamp: "2024-06-13T08:47:02Z"
Thu, 13 Jun 2024 10:47:03 CEST  pinniped-login  cmd/login_oidc.go:260  Performing OIDC login  {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 10:47:03 CEST  oidcclient/login.go:399  Pinniped: Found unexpired cached token.  {"type": "id_token"}
Thu, 13 Jun 2024 10:47:03 CEST  pinniped-login  cmd/login_oidc.go:270  Exchanging token for cluster credential  {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 10:47:04 CEST  pinniped-login  cmd/login_oidc.go:278  Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 10:47:04 CEST  pinniped-login  cmd/login_oidc.go:285  caching cluster credential for future use.

Another observation is that if you are in K9s and looking at the logs from a pod, the stream is interrupted when the mTLS certificate expires.

cat ~/.config/pinniped/credentials.yaml
apiVersion: config.supervisor.pinniped.dev/v1alpha1
credentials:
- creationTimestamp: "2024-06-13T07:46:07Z"
  credential:
    clientCertificateData: |
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----
    clientKeyData: |
      -----BEGIN PRIVATE KEY-----
      -----END PRIVATE KEY-----
    expirationTimestamp: "2024-06-13T07:51:07Z"
  key: e8571687627a8ad811771615a815627264bfb85515ca7208ef5f6eb2aba5b4ab
  lastUsedTimestamp: "2024-06-13T07:49:24Z"
kind: CredentialCache
Every 1.0s: cat stderr.log
Thu, 13 Jun 2024 09:51:10 CEST  pinniped-login  cmd/login_oidc.go:260  Performing OIDC login  {"issuer": "https://vault-nonprod-private-vault-00000000.00000000.z1.hashicorp.cloud:8200/v1/admin/yyyy/identity/oidc/provider/xxxx", "client id": "<CLIENT-ID>"}
Thu, 13 Jun 2024 09:51:10 CEST  oidcclient/login.go:399  Pinniped: Found unexpired cached token.  {"type": "id_token"}
Thu, 13 Jun 2024 09:51:10 CEST  pinniped-login  cmd/login_oidc.go:270  Exchanging token for cluster credential  {"endpoint": "https://auth.<cluster>.tld", "authenticator type": "jwt", "authenticator name": "jwtauthenticator"}
Thu, 13 Jun 2024 09:51:10 CEST  pinniped-login  cmd/login_oidc.go:278  Successfully exchanged token for cluster credential.
Thu, 13 Jun 2024 09:51:10 CEST  pinniped-login  cmd/login_oidc.go:285  caching cluster credential for future use.

stream-lukker

uozalp commented 1 month ago

Pinniped renews the mTLS certificate after it expires, which can lead to brief disconnections. A potential solution is to cache two certificates in ~/.config/pinniped/credentials.yaml with a 1-2 minute overlap. This way, the new certificate would already be issued minutes before the existing one expires, ensuring a seamless transition and eliminating potential connection interruptions.

cfryanr commented 1 month ago

Unfortunately HashiCorp Vault does not support refresh tokens yet.

That is unfortunate. However, your CLI debug logs above always show Found unexpired cached token. {"type": "id_token"} which means that you are probably getting back an ID token with a long lifetime. The expiration of your ID token should be shown in your sessions.yaml file, so you can check its lifetime if you're curious. This should also work, because Pinniped will continue to use the ID token until it expires. It will send the ID token to the Concierge's TokenCredentialRequest endpoint, which will immediately return a new mTLS client cert. You can see this in your CLI debug logs where it talks about exchanging the token for a cluster credential. You can see that it is all happening at the same timestamp, so it is taking less than a second (probably a lot less).

I'm not able to reproduce the slowness which leads to the "dial k8s toast" errors from k9s. Even if I disable refresh tokens in my OIDC provider, I still never see that until my ID token expires and I need to log in again in my browser (k9s is understandably upset until I finish the login, and then it is happy again). I'm not sure yet how you're getting those dial errors from k9s while your ID token is still valid. Could you try again and pay attention to your ID token expiration time? Are you seeing those errors while your ID token is not yet expired?

I can reproduce the log stream ending in k9s when the mTLS token expires. This must be a side-effect of how k9s implements log streaming. It does not happen if I use kubectl logs --follow. That command will tail my logs forever regardless of mTLS certs expiring in the middle of the log tailing.

uozalp commented 1 month ago

Thank you for the detailed feedback.

Due to limitations with our current setup, we have configured our ID and access tokens to 8 hours to prevent the need for frequent logins. Ideally, we would like to use short-lived tokens with an 8-hour refresh token, but unfortunately, HashiCorp Vault does not support this yet (https://github.com/hashicorp/vault/issues/16134).

These "dial k8s toast" errors occur while the ID and access tokens are still valid.

I will investigate potential network latency issues with our network team tomorrow.

I agree that the log stream ending when the mTLS token expires seems to be related to how k9s handles log streaming. While kubectl logs --follow works.

uozalp commented 1 month ago

Hi @cfryanr

We resolved the problem. Our security team had disabled outbound NTP in the firewall, and the time on two of our worker nodes had drifted by 4-5 seconds.

Revisiting the previous logs, this now makes sense. There is a gap of a few seconds between the certificate expiry time and the first line of the debug log.

It might still be beneficial to make the duration of the mTLS certificate configurable ?

Thanks for your help.

cfryanr commented 1 month ago

Ah very interesting! I'm glad that you were able to discover the root cause.

I'm going to close this issue for now because things are working as expected. The short lifetime of the credentials is part of the security assurances that Pinniped provides, so we would rather not make it configurable unless there is a compelling use case. Your situation was solved, so it doesn't feel like a compelling use case for making the lifetime configurable (yet).