kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.27k stars 1.54k forks source link

MountVolume.SetUp failed for volume "kube-api-access-r24g5" : object "<namespace>"/"kube-root-ca.crt" not registered #2723

Closed caniko closed 2 years ago

caniko commented 2 years ago

What happened: Cannot mount kube-root-ca.crt when in a kind cluster with three worker nodes. I also have problems setting up pods (crash -> backoff).

After further investigation I seem to get this error also when working with custom service accounts in terraform; even with single node cluster.

What you expected to happen: Run job in a kind cluster with three workers kind create cluster --config=./kind.yaml:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker

How to reproduce it (as minimally and precisely as possible): First, create a kind cluster with three workers.

Then use the following infrastructure in terraform:

locals {
  config_path = "~/.kube/config"
  config_context = "kind-kind"
}
provider "kubernetes" {
  config_path = local.config_path
  config_context = local.config_context
}
resource "kubernetes_job_v1" "test-job" {
  metadata {
    name = "test-job"
  }
  spec {
    template {
      metadata {}
      spec {
        container {
          name = "test-job-container"
          image = "alpine"
        }
        restart_policy = "Never"
      }
    }
  }
}

You can run kubectl describe pods to get the error message:

Events:
  Type     Reason       Age              From               Message
  ----     ------       ----             ----               -------
  Normal   Scheduled    11s              default-scheduler  Successfully assigned default/test-job-rgjg5 to kind-worker3
  Normal   Pulling      10s              kubelet            Pulling image "alpine"
  Normal   Pulled       7s               kubelet            Successfully pulled image "alpine" in 3.654072683s
  Normal   Created      6s               kubelet            Created container test-job-container
  Normal   Started      6s               kubelet            Started container test-job-container
  Warning  FailedMount  4s (x3 over 5s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-z26v4" : object "default"/"kube-root-ca.crt" not registered

Environment:

Server: Containers: 121 Running: 10 Paused: 0 Stopped: 111 Images: 382 Server Version: 20.10.14 Storage Driver: overlay2 Backing Filesystem: btrfs Supports d_type: true Native Overlay Diff: false userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: active NodeID: mgyovnegnci04mmp3wrqeegh5 Is Manager: true ClusterID: 1lz716k144na0ttv13c01sfy4 Managers: 1 Nodes: 1 Default Address Pool: 10.0.0.0/8
SubnetSize: 24 Data Path Port: 4789 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 141.42.20.226 Manager Addresses: 141.42.20.226:2377 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: de8046a5501db9e0e478e1c10cbcfb21af4c6b2d.m runc version: init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 5.17.3-zen1-1-zen Operating System: Garuda Linux OSType: linux Architecture: x86_64 CPUs: 16 Total Memory: 31.36GiB Name: vianadasilva1 ID: K2MW:7JRT:7GHL:4FGG:C5LU:EYDM:73U5:U3RU:G4TT:NRQZ:4A6X:AKPK Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: http://proxy.charite.de:8080 HTTPS Proxy: http://proxy.charite.de:8080 Username: caniko Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

- OS (e.g. from `/etc/os-release`):

NAME="Garuda Linux" PRETTY_NAME="Garuda Linux" ID=garuda ID_LIKE=arch BUILD_ID=rolling ANSI_COLOR="38;2;23;147;209" HOME_URL="https://garudalinux.org/" DOCUMENTATION_URL="https://wiki.garudalinux.org/" SUPPORT_URL="https://forum.garudalinux.org/" BUG_REPORT_URL="https://gitlab.com/groups/garuda-linux/" LOGO=garudalinux

BenTheElder commented 2 years ago

Are you able to share the resource in a kubernetes yaml that doesn't require terraform (kubectl get po -oyaml $pod)?

Is it exactly kind create cluster with v0.12.0 or do you have some options enabled?

Warning FailedMount 9m28s (x3 over 9m29s) kubelet MountVolume.SetUp failed for volume "kube-api-access-827hx" : object "fastapi"/"kube-root-ca.crt" not registered

This error doesn't seem to match with the terraform? where is this namespace coming from?

caniko commented 2 years ago

This error doesn't seem to match with the terraform? where is this namespace coming from?

Yes, you are correct. It is the same error so I didn't bother. Apologies. I updated the post.

Is it exactly kind create cluster with v0.12.0 or do you have some options enabled?

~Yes.~ No, I am running kind with three worker nodes.

kubectl get po -oyaml test-job-rgjg5:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-04-20T07:01:16Z"
  generateName: test-job-
  labels:
    controller-uid: 57b6637b-cd05-4ae4-90e3-217c0da51217
    job-name: test-job
  name: test-job-rgjg5
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: test-job
    uid: 57b6637b-cd05-4ae4-90e3-217c0da51217
  resourceVersion: "1303"
  uid: 41fd2504-a7b1-4230-be1c-60e20fa6a020
spec:
  automountServiceAccountToken: true
  containers:
  - image: alpine
    imagePullPolicy: Always
    name: test-job-container
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-z26v4
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-worker3
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  shareProcessNamespace: false
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-z26v4
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:01:16Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:01:22Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:01:22Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:01:16Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://3ecbafa9ed5d7e7abe7847b910fa3669a9c92ffc161fea24134103f016f83600
    image: docker.io/library/alpine:latest
    imageID: docker.io/library/alpine@sha256:4edbd2beb5f78b1014028f4fbb99f3237d9561100b6881aabbf5acce2c4f9454
    lastState: {}
    name: test-job-container
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://3ecbafa9ed5d7e7abe7847b910fa3669a9c92ffc161fea24134103f016f83600
        exitCode: 0
        finishedAt: "2022-04-20T07:01:21Z"
        reason: Completed
        startedAt: "2022-04-20T07:01:21Z"
  hostIP: 172.18.0.4
  phase: Succeeded
  podIP: 10.244.3.2
  podIPs:
  - ip: 10.244.3.2
  qosClass: BestEffort
  startTime: "2022-04-20T07:01:16Z"
caniko commented 2 years ago

However, I tried to run this in a kind cluster with a single node (kind create cluster):

locals {
  jwt_secret_path = "jwt"
  postgres_username = "ufastapi"
  k8_namespace = "fastapi"
  postgres_db_name = "users"

  jwt_secrets_service_account_name = "sa-jwt"

  config_path = "~/.kube/config"
  config_context = "kind-kind"
}

provider "kubernetes" {
  config_path = local.config_path
  config_context = local.config_context
}
resource "kubernetes_namespace_v1" "create_k8_namespaces" {
  metadata {
    annotations = {
      name = local.k8_namespace
    }
    name = local.k8_namespace
  }
}
resource "kubernetes_secret_v1" "jwt-secret" {
  depends_on = [kubernetes_namespace_v1.create_k8_namespaces]
  metadata {
    name = "jwt-secrets"
    namespace = local.k8_namespace
  }
  data = {
    "private" = ""
    "public" = ""
  }
}
resource "kubernetes_service_account_v1" "jwt-secrets-account" {
  depends_on = [kubernetes_secret_v1.jwt-secret]
  metadata {
    name = local.jwt_secrets_service_account_name
    namespace = local.k8_namespace
  }
  secret {
    name = kubernetes_secret_v1.jwt-secret.metadata.0.name
  }
}
resource "kubernetes_role_v1" "jwt-secrets-role" {
  depends_on = [kubernetes_namespace_v1.create_k8_namespaces]
  metadata {
    name = "jwt-secrets-role"
    namespace = local.k8_namespace
  }
  rule {
    api_groups = [""]
    resources = ["secrets", "serviceaccounts", "serviceaccounts/token"]
    verbs = ["get", "patch"]
  }
}
resource "kubernetes_role_binding_v1" "jwt-secrets-role-bind" {
  depends_on = [
    kubernetes_namespace_v1.create_k8_namespaces,
    kubernetes_service_account_v1.jwt-secrets-account,
    kubernetes_role_v1.jwt-secrets-role
  ]
  metadata {
    name = "jwt-secrets-rolebinding"
    namespace = local.k8_namespace
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind = "Role"
    name = kubernetes_role_v1.jwt-secrets-role.metadata.0.name
  }
  subject {
    kind = "ServiceAccount"
    namespace = local.k8_namespace
    name = local.jwt_secrets_service_account_name
  }
}
resource "kubernetes_job_v1" "jwt-secret-generation-job" {
  depends_on = [
    kubernetes_secret_v1.jwt-secret,
    kubernetes_role_binding_v1.jwt-secrets-role-bind
  ]
  metadata {
    name = "jwt-secret-generation-job"
    namespace = local.k8_namespace
  }
  spec {
    template {
      metadata {}
      spec {
        service_account_name = local.jwt_secrets_service_account_name
        automount_service_account_token = true
        container {
          name = "jwt-secret-generation-job"
          image = "caniko/syndb-jwt-secret:latest"
          env {
            name = "JWT_SECRET_NAME"
            value = kubernetes_secret_v1.jwt-secret.metadata.0.name
          }
        }
        restart_policy = "Never"
      }
    }
  }
}

And I get the following kubectl describe pods:

Events:
  Type     Reason            Age                    From               Message
  ----     ------            ----                   ----               -------
  Warning  FailedScheduling  7m1s                   default-scheduler  0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         6m52s                  default-scheduler  Successfully assigned fastapi/jwt-secret-generation-job-9xdwj to kind-control-plane
  Normal   Pulling           6m50s                  kubelet            Pulling image "caniko/syndb-jwt-secret:latest"
  Normal   Pulled            6m41s                  kubelet            Successfully pulled image "caniko/syndb-jwt-secret:latest" in 8.37944194s
  Normal   Created           6m41s                  kubelet            Created container jwt-secret-generation-job
  Normal   Started           6m41s                  kubelet            Started container jwt-secret-generation-job
  Warning  FailedMount       6m38s (x3 over 6m39s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-tdw9z" : object "fastapi"/"kube-root-ca.crt" not registered

kubectl -n fastapi get po -oyaml jwt-secret-generation-job-k5jj5:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-04-20T07:29:02Z"
  generateName: jwt-secret-generation-job-
  labels:
    controller-uid: e1755d9e-55f7-4c48-8f73-59c62312ca8b
    job-name: jwt-secret-generation-job
  name: jwt-secret-generation-job-k5jj5
  namespace: fastapi
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: jwt-secret-generation-job
    uid: e1755d9e-55f7-4c48-8f73-59c62312ca8b
  resourceVersion: "583"
  uid: eb3a2731-d007-4ed9-848c-c60e0def9a81
spec:
  automountServiceAccountToken: true
  containers:
  - env:
    - name: JWT_SECRET_NAME
      value: jwt-secrets
    image: caniko/syndb-jwt-secret:latest
    imagePullPolicy: Always
    name: jwt-secret-generation-job
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-xj2cs
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-control-plane
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: sa-jwt
  serviceAccountName: sa-jwt
  shareProcessNamespace: false
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-xj2cs
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:29:04Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:29:14Z"
    message: 'containers with unready status: [jwt-secret-generation-job]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:29:14Z"
    message: 'containers with unready status: [jwt-secret-generation-job]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2022-04-20T07:29:04Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://5adb6ba3b74228658fb91a39add044beb110ed1638d7260d246ec53497419401
    image: docker.io/caniko/syndb-jwt-secret:latest
    imageID: docker.io/caniko/syndb-jwt-secret@sha256:5eb58f25fecc1edb7c762788bd65208073267cc5bce32c7b7f447622e2e4ca79
    lastState: {}
    name: jwt-secret-generation-job
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://5adb6ba3b74228658fb91a39add044beb110ed1638d7260d246ec53497419401
        exitCode: 1
        finishedAt: "2022-04-20T07:29:13Z"
        reason: Error
        startedAt: "2022-04-20T07:29:13Z"
  hostIP: 172.18.0.2
  phase: Failed
  podIP: 10.244.0.3
  podIPs:
  - ip: 10.244.0.3
  qosClass: BestEffort
  startTime: "2022-04-20T07:29:04Z"
caniko commented 2 years ago

I have the same error when trying to run in with the cluster with the minikube kvm2 driver.

BenTheElder commented 2 years ago

this is https://github.com/kubernetes/kubernetes/issues/105204#issuecomment-1104427967

caniko commented 2 years ago

@BenTheElder, can you please build v1.22.9 and v1.23.6 for the latest KIND release so we can test if the new build fixes the problems on my end?

I also think you closed this issue pre-maturely. KIND is a testing suit, as such users should be made aware of this bug; right?

BenTheElder commented 2 years ago

We track known usage issues in https://kind.sigs.k8s.io/docs/user/known-issues/, the issues tracker tracks what we're fixing. the bug is in kubernetes, and is fixed there.

At the moment images are typically released alongside kind, there are some open tracking issues talking about how we might automate it, but we need help.

I'll put a TODO to push some later in this case, in the meantime with the guide you can build your own.

BenTheElder commented 2 years ago

Kubernetes v1.24 is delayed, I'd been hoping we'd be shipping that ~now, it has some other desired fixes #2722