kubernetes / kops

Kubernetes Operations (kOps) - Production Grade k8s Installation, Upgrades and Management
https://kops.sigs.k8s.io/
Apache License 2.0
15.98k stars 4.65k forks source link

Wrong kube-proxy mount bind propagation causes node certificates to not update itself and expire #16400

Closed hostops closed 3 months ago

hostops commented 8 months ago

/kind bug

1. What kops version are you running? The command kops version, will display this information. Last applied server version: 1.25.3

2. What Kubernetes version are you running? kubectl version will print the version if a cluster is running or provide the Kubernetes version specified as a kops flag. Server Version: v1.25.5 3. What cloud provider are you using? AWS 4. What commands did you run? What is the simplest way to reproduce this issue?

# get /var/lib/kube-proxy/kubeconfig from kube-proxy container
kubectl exec -n kube-system kube-proxy-i-03fa9558373c958ff -- cat /var/lib/kube-proxy/kubeconfig | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout
# get /var/lib/kube-proxy/kubeconfig from node
ssh ubuntu@$(kubectl get node i-03fa9558373c958ff -o json | jq '.status.addresses[4].address' -r) "sudo cat /var/lib/kube-proxy/kubeconfig" | yq '.users[0].user."client-certificate-data"' -r | base64 --decode | openssl x509 -enddate -noout

5. What happened after the commands executed? I get two different outputs for the same file.

# notAfter=Mar 26 16:39:07 2024 GMT
# notAfter=Aug  1 15:09:29 2024 GMT

6. What did you expect to happen? I expected to get the same output since this should be the same file. You can see that when you check volumes and volumeMounts of kube-proxy path

volumeMounts:
- mountPath: /var/lib/kube-proxy/kubeconfig
  name: kubeconfig
  readOnly: true
volumes:
- hostPath:
    path: /var/lib/kube-proxy/kubeconfig
    type: ""
  name: kubeconfig

Also if you check container configuration using ctr sudo ctr -n k8s.io container inspect <kube-config container id> You can confirm this container uses the same file.

            {
                "destination": "/var/lib/kube-proxy/kubeconfig",
                "type": "bind",
                "source": "/var/lib/kube-proxy/kubeconfig",
                "options": [
                    "rbind",
                    "rprivate",
                    "ro"
                ]
            }

I also believe those lines caused the issue. Especialy rprivate option. From docker documentation one can see

Bind propagation refers to whether or not mounts created within a given bind-mount or named volume can be propagated to replicas of that mount. rprivate The default. The same as private, meaning that no mount points anywhere within the original or replica mount points propagate in either direction.

So the default rprivate option can cause unsynchronized state if we have multiple replicas of this mounts. So another condition must be met for this issue to happen. There must be multiple containers with the same mount. This happens if node restarted and new kube-proxy pod is created. I believe this is the default behaviour so kubernetes can get logs from previous container. Even if only one the two containers is running/used this can happen. I can tested this hypothesis by running this command sudo ctr -n k8s.io containers ls | grep proxy | wc -l on all of our nodes. On all nodes where we have multiple containers (2) we can see unsynchronized certificates. Also I can confirm those are the only kube-proxy pods with restarts > 0. And on nodes with only single kube-proxy container we can see files /var/lib/kube-proxy/kubeconfig from node and pod to be the same.

7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest. You may want to remove your cluster name and other sensitive information.

apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
spec:
  api:
    loadBalancer:
      class: Classic
      type: Public
  authorization:
    rbac: {}
  awsLoadBalancerController:
    enabled: true
  certManager:
    enabled: true
    managed: false
  channel: stable
  cloudProvider: aws
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeDNS:
    provider: CoreDNS
  kubelet:
    anonymousAuth: false
    authenticationTokenWebhook: true
    authorizationMode: Webhook
  kubernetesApiAccess:
  - 0.0.0.0/0
  kubernetesVersion: 1.25.5
  networking:
    calico: {}
  nonMasqueradeCIDR: 100.64.0.0/10
  sshAccess:
  - 0.0.0.0/0
  subnets:
  topology:
    dns:
      type: Public

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
  machineType: c6a.large
  maxSize: 1
  minSize: 1
  nodeLabels:
    kops.k8s.io/instancegroup: master-eu-west-1a
  role: Master
  subnets:
  - eu-west-1a

---

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  name: nodes
spec:
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20221206
  machineType: m5a.xlarge
  maxSize: 6
  minSize: 5
  nodeLabels:
    kops.k8s.io/instancegroup: nodes
  packages:
  - nfs-common
  role: Node
  subnets:
  - eu-west-1a

9. Anything else do we need to know? So I believe this bug is caused by:

  1. Node restarted so extra container is created so kubernetes can show previous logs or something like that kubectl logs -p
  2. Due to default mount option rprivate this caused /var/lib/kube-proxy/kubeconfig to be unsynchronized between containers and node.
  3. Node updated its kubernetes certificates and kubeconfig file.
  4. Changes did not propagate to kube-proxy container
  5. When old certificate still used in kube-proxy expired node got in not ready state

Logs from kube-proxy

factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Unauthorized
factory.go:134: failed to list *v1.Service: Unauthorized

Logs from kube-api-server

E0309 22:14:53.387051      10 authentication.go:63] "Unable to authenticate the request" err="[x509: certificate has expired or is not yet valid: c
urrent time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z, verifying certificate SN=123932699956712321412974984599202854160, SKID=, AKID= fail
ed: x509: certificate has expired or is not yet valid: current time 2024-03-09T22:14:53Z is after 2024-03-09T18:39:20Z]"

10. Possible solutions?

Also, the more I read, the more I am not sure rprivate causes this behavior of unsynchronized file. Can you come up with any way we can confirm that?

hostops commented 8 months ago

Also, I found out updating certificates is not expected behavior. https://github.com/kubernetes/kops/issues/15970#issuecomment-1740027576

No, kops expects you to update nodes at least every 455 days.

So this issue probably does not make sense? Can someone confirm?

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 3 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/kops/issues/16400#issuecomment-2281473827): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.