k3d-io / k3d

Little helper to run CNCF's k3s in Docker
MIT License
5.33k stars 456 forks source link

[BUG] New Cluster stuck on ContainerCreating #1449

Open shanilhirani opened 3 months ago

shanilhirani commented 3 months ago

What did you do

What did you expect to happen

I was expecting a deploy of nginx to be started and ready for consumption however upon investigation is appears that mycluster does not appear to start correctly as pods stuck in a containercreating state, they seem to be failing to pull down container images.

NOTE: This issue DOES NOT OCCUR when using K3d 5.6.0, as I have rolled back to this version and the cluster bootstraps fine.

Screenshots or terminal output

 k3d cluster create mycluster
INFO[0000] Prep: Network                                
INFO[0000] Created network 'k3d-mycluster'              
INFO[0000] Created image volume k3d-mycluster-images    
INFO[0000] Starting new tools node...                   
INFO[0000] Starting node 'k3d-mycluster-tools'          
INFO[0001] Creating node 'k3d-mycluster-server-0'       
INFO[0001] Creating LoadBalancer 'k3d-mycluster-serverlb' 
INFO[0001] Using the k3d-tools node to gather environment information 
INFO[0001] Starting new tools node...                   
INFO[0001] Starting node 'k3d-mycluster-tools'          
INFO[0002] Starting cluster 'mycluster'                 
INFO[0002] Starting servers...                          
INFO[0002] Starting node 'k3d-mycluster-server-0'       
INFO[0006] All agents already running.                  
INFO[0006] Starting helpers...                          
INFO[0006] Starting node 'k3d-mycluster-serverlb'       
INFO[0012] Injecting records for hostAliases (incl. host.k3d.internal) and for 3 network members into CoreDNS configmap... 
INFO[0015] Cluster 'mycluster' created successfully!    
INFO[0015] You can now use it like this:                
kubectl cluster-info
kubectl cluster-info
Kubernetes control plane is running at
CoreDNS is running at
Metrics-server is running at
k get nodes
NAME                     STATUS   ROLES                  AGE     VERSION
k3d-mycluster-server-0   Ready    control-plane,master   2m10s   v1.28.8+k3s1
k get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS              RESTARTS   AGE
kube-system   helm-install-traefik-crd-svjd2            0/1     ContainerCreating   0          8m46s
kube-system   helm-install-traefik-tbc2t                0/1     ContainerCreating   0          8m46s
kube-system   coredns-6799fbcd5-8mqf4                   0/1     ContainerCreating   0          8m46s
kube-system   metrics-server-54fd9b65b-4fqhg            0/1     ContainerCreating   0          8m46s
kube-system   local-path-provisioner-6c86858495-25nvr   0/1     ContainerCreating   0          8m46s
k events --all-namespaces
NAMESPACE     LAST SEEN              TYPE      REASON                           OBJECT                                         MESSAGE
default       14m                    Normal    Starting                         Node/k3d-mycluster-server-0                    Starting kubelet.
default       14m                    Warning   InvalidDiskCapacity              Node/k3d-mycluster-server-0                    invalid capacity 0 on image filesystem
default       14m (x2 over 14m)      Normal    NodeHasSufficientMemory          Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasSufficientMemory
default       14m (x2 over 14m)      Normal    NodeHasNoDiskPressure            Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasNoDiskPressure
default       14m (x2 over 14m)      Normal    NodeHasSufficientPID             Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeHasSufficientPID
default       14m                    Normal    NodeAllocatableEnforced          Node/k3d-mycluster-server-0                    Updated Node Allocatable limit across pods
default       14m                    Normal    NodeReady                        Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 status is now: NodeReady
kube-system   14m                    Normal    ApplyingManifest                 Addon/auth-delegator                           Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/ccm                                      Applying manifest at "/var/lib/rancher/k3s/server/manifests/ccm.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/ccm                                      Applied manifest at "/var/lib/rancher/k3s/server/manifests/ccm.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/local-storage                            Applying manifest at "/var/lib/rancher/k3s/server/manifests/local-storage.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/local-storage                            Applied manifest at "/var/lib/rancher/k3s/server/manifests/local-storage.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/aggregated-metrics-reader                Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/aggregated-metrics-reader                Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/aggregated-metrics-reader.yaml"
default       14m                    Normal    NodePasswordValidationComplete   Node/k3d-mycluster-server-0                    Deferred node password secret validation complete
kube-system   14m                    Normal    AppliedManifest                  Addon/auth-delegator                           Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-delegator.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/auth-reader                              Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/auth-reader                              Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/auth-reader.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-apiservice                       Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-apiservice                       Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-apiservice.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-server-deployment                Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-server-deployment                Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-deployment.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/metrics-server-service                   Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/metrics-server-service                   Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/metrics-server-service.yaml"
default       14m                    Normal    Starting                         Node/k3d-mycluster-server-0                    
kube-system   14m                    Normal    ApplyingManifest                 Addon/resource-reader                          Applying manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/resource-reader                          Applied manifest at "/var/lib/rancher/k3s/server/manifests/metrics-server/resource-reader.yaml"
default       14m                    Normal    Synced                           Node/k3d-mycluster-server-0                    Node synced successfully
kube-system   14m                    Normal    ApplyingManifest                 Addon/rolebindings                             Applying manifest at "/var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/rolebindings                             Applied manifest at "/var/lib/rancher/k3s/server/manifests/rolebindings.yaml"
kube-system   14m                    Normal    ApplyingManifest                 Addon/runtimes                                 Applying manifest at "/var/lib/rancher/k3s/server/manifests/runtimes.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/runtimes                                 Applied manifest at "/var/lib/rancher/k3s/server/manifests/runtimes.yaml"
kube-system   14m (x3 over 14m)      Normal    ApplyJob                         HelmChart/traefik-crd                          Applying HelmChart using Job kube-system/helm-install-traefik-crd
kube-system   14m (x4 over 14m)      Normal    ApplyJob                         HelmChart/traefik                              Applying HelmChart using Job kube-system/helm-install-traefik
kube-system   14m                    Normal    ApplyingManifest                 Addon/traefik                                  Applying manifest at "/var/lib/rancher/k3s/server/manifests/traefik.yaml"
kube-system   14m                    Normal    AppliedManifest                  Addon/traefik                                  Applied manifest at "/var/lib/rancher/k3s/server/manifests/traefik.yaml"
default       14m                    Normal    RegisteredNode                   Node/k3d-mycluster-server-0                    Node k3d-mycluster-server-0 event: Registered Node k3d-mycluster-server-0 in Controller
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/coredns                             Scaled up replica set coredns-6799fbcd5 to 1
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/coredns-6799fbcd5                   Created pod: coredns-6799fbcd5-8mqf4
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/local-path-provisioner-6c86858495   Created pod: local-path-provisioner-6c86858495-25nvr
kube-system   14m                    Normal    SuccessfulCreate                 ReplicaSet/metrics-server-54fd9b65b            Created pod: metrics-server-54fd9b65b-4fqhg
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/metrics-server                      Scaled up replica set metrics-server-54fd9b65b to 1
kube-system   14m                    Normal    SuccessfulCreate                 Job/helm-install-traefik-crd                   Created pod: helm-install-traefik-crd-svjd2
kube-system   14m                    Normal    ScalingReplicaSet                Deployment/local-path-provisioner              Scaled up replica set local-path-provisioner-6c86858495 to 1
kube-system   14m                    Normal    SuccessfulCreate                 Job/helm-install-traefik                       Created pod: helm-install-traefik-tbc2t
kube-system   14m                    Warning   FailedScheduling                 Pod/coredns-6799fbcd5-8mqf4                    0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Warning   FailedScheduling                 Pod/local-path-provisioner-6c86858495-25nvr    0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Warning   FailedScheduling                 Pod/metrics-server-54fd9b65b-4fqhg             0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
kube-system   14m                    Normal    Scheduled                        Pod/helm-install-traefik-crd-svjd2             Successfully assigned kube-system/helm-install-traefik-crd-svjd2 to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/helm-install-traefik-tbc2t                 Successfully assigned kube-system/helm-install-traefik-tbc2t to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/coredns-6799fbcd5-8mqf4                    Successfully assigned kube-system/coredns-6799fbcd5-8mqf4 to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/metrics-server-54fd9b65b-4fqhg             Successfully assigned kube-system/metrics-server-54fd9b65b-4fqhg to k3d-mycluster-server-0
kube-system   14m                    Normal    Scheduled                        Pod/local-path-provisioner-6c86858495-25nvr    Successfully assigned kube-system/local-path-provisioner-6c86858495-25nvr to k3d-mycluster-server-0
kube-system   14m (x2 over 14m)      Normal    ApplyingManifest                 Addon/coredns                                  Applying manifest at "/var/lib/rancher/k3s/server/manifests/coredns.yaml"
kube-system   14m (x2 over 14m)      Normal    AppliedManifest                  Addon/coredns                                  Applied manifest at "/var/lib/rancher/k3s/server/manifests/coredns.yaml"
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/helm-install-traefik-crd-svjd2             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/metrics-server-54fd9b65b-4fqhg             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/local-path-provisioner-6c86858495-25nvr    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/helm-install-traefik-tbc2t                 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
kube-system   3m59s (x38 over 14m)   Warning   FailedCreatePodSandBox           Pod/coredns-6799fbcd5-8mqf4                    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again
❯ k describe pods --all-namespaces
Name:             helm-install-traefik-crd-svjd2
Namespace:        kube-system
Priority:         0
Service Account:  helm-traefik-crd
Node:             k3d-mycluster-server-0/
Start Time:       Thu, 06 Jun 2024 13:25:04 +0100
Labels:           batch.kubernetes.io/controller-uid=784304da-3a35-4ea0-a851-0b8b4ef1faad
Annotations:      helmcharts.helm.cattle.io/configHash: SHA256=E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855
Status:           Pending
SeccompProfile:   RuntimeDefault
IPs:              <none>
Controlled By:    Job/helm-install-traefik-crd
    Container ID:  
    Image:         rancher/klipper-helm:v0.8.3-build20240228
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
      NAME:                   traefik-crd
      HELM_DRIVER:            secret
      CHART_NAMESPACE:        kube-system
      CHART:                  https://%{KUBERNETES_API}%/static/charts/traefik-crd-25.0.2+up25.0.0.tgz
      TARGET_NAMESPACE:       kube-system
      NO_PROXY:               .svc,.cluster.local,,
      FAILURE_POLICY:         reinstall
      /chart from content (rw)
      /config from values (rw)
      /home/klipper-helm/.cache from klipper-cache (rw)
      /home/klipper-helm/.config from klipper-config (rw)
      /home/klipper-helm/.helm from klipper-helm (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rsvf7 (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:        Secret (a volume populated by a Secret)
    SecretName:  chart-values-traefik-crd
    Optional:    false
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik-crd
    Optional:  false
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/helm-install-traefik-crd-svjd2 to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Name:             helm-install-traefik-tbc2t
Namespace:        kube-system
Priority:         0
Service Account:  helm-traefik
Node:             k3d-mycluster-server-0/
Start Time:       Thu, 06 Jun 2024 13:25:04 +0100
Labels:           batch.kubernetes.io/controller-uid=da083364-1afc-4baf-8e36-08abc3161832
Annotations:      helmcharts.helm.cattle.io/configHash: SHA256=2C8876269AFB411F60BCDA289A1957C0126147D80F1B0AC6BD2C43C10FE296E9
Status:           Pending
SeccompProfile:   RuntimeDefault
IPs:              <none>
Controlled By:    Job/helm-install-traefik
    Container ID:  
    Image:         rancher/klipper-helm:v0.8.3-build20240228
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
      NAME:                   traefik
      HELM_DRIVER:            secret
      CHART_NAMESPACE:        kube-system
      CHART:                  https://%{KUBERNETES_API}%/static/charts/traefik-25.0.2+up25.0.0.tgz
      TARGET_NAMESPACE:       kube-system
      NO_PROXY:               .svc,.cluster.local,,
      FAILURE_POLICY:         reinstall
      /chart from content (rw)
      /config from values (rw)
      /home/klipper-helm/.cache from klipper-cache (rw)
      /home/klipper-helm/.config from klipper-config (rw)
      /home/klipper-helm/.helm from klipper-helm (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zl645 (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
    Type:        Secret (a volume populated by a Secret)
    SecretName:  chart-values-traefik
    Optional:    false
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      chart-content-traefik
    Optional:  false
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/helm-install-traefik-tbc2t to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Name:                 coredns-6799fbcd5-8mqf4
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 k3d-mycluster-server-0/
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               k8s-app=kube-dns
Annotations:          <none>
Status:               Pending
IPs:                  <none>
Controlled By:        ReplicaSet/coredns-6799fbcd5
    Container ID:  
    Image:         rancher/mirrored-coredns-coredns:1.10.1
    Image ID:      
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
      memory:  170Mi
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-c4xng (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                              node-role.kubernetes.io/master:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/coredns-6799fbcd5-8mqf4 to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Name:                 metrics-server-54fd9b65b-4fqhg
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      metrics-server
Node:                 k3d-mycluster-server-0/
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               k8s-app=metrics-server
Annotations:          <none>
Status:               Pending
IPs:                  <none>
Controlled By:        ReplicaSet/metrics-server-54fd9b65b
    Container ID:  
    Image:         rancher/mirrored-metrics-server:v0.7.0
    Image ID:      
    Port:          10250/TCP
    Host Port:     0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get https://:https/livez delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:https/readyz delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vqc8w (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    SizeLimit:  <unset>
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/metrics-server-54fd9b65b-4fqhg to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Name:                 local-path-provisioner-6c86858495-25nvr
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      local-path-provisioner-service-account
Node:                 k3d-mycluster-server-0/
Start Time:           Thu, 06 Jun 2024 13:25:05 +0100
Labels:               app=local-path-provisioner
Annotations:          <none>
Status:               Pending
IPs:                  <none>
Controlled By:        ReplicaSet/local-path-provisioner-6c86858495
    Container ID:  
    Image:         rancher/local-path-provisioner:v0.0.26
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
      POD_NAMESPACE:  kube-system (v1:metadata.namespace)
      /etc/config/ from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hg64p (ro)
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      local-path-config
    Optional:  false
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  Type     Reason                  Age                From               Message
  ----     ------                  ----               ----               -------
  Warning  FailedScheduling        20m                default-scheduler  0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
  Normal   Scheduled               20m                default-scheduler  Successfully assigned kube-system/local-path-provisioner-6c86858495-25nvr to k3d-mycluster-server-0
  Warning  FailedCreatePodSandBox  3s (x74 over 20m)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/mirrored-pause:3.6": failed to pull image "rancher/mirrored-pause:3.6": failed to pull and unpack image "docker.io/rancher/mirrored-pause:3.6": failed to resolve reference "docker.io/rancher/mirrored-pause:3.6": failed to do request: Head "https://registry-1.docker.io/v2/rancher/mirrored-pause/manifests/3.6": dial tcp: lookup registry-1.docker.io: Try again

Which OS & Architecture

Which version of k3d

Which version of docker

Server: Docker Engine - Community Engine: Version: 26.1.1 API version: 1.45 (minimum version 1.24) Go version: go1.21.9 Git commit: ac2de55 Built: Tue Apr 30 11:48:47 2024 OS/Arch: linux/arm64 Experimental: false containerd: Version: 1.6.31 GitCommit: e377cd56a71523140ca6ae87e30244719194a521 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0

Client: Docker Engine - Community Version: 26.1.3 Context: colima Debug Mode: false Plugins: compose: Docker Compose (Docker Inc.) Version: 2.27.1 Path: /Users/$USER/.docker/cli-plugins/docker-compose

Server: Containers: 3 Running: 3 Paused: 0 Stopped: 0 Images: 8 Server Version: 26.1.1 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: e377cd56a71523140ca6ae87e30244719194a521 runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 6.8.0-31-generic Operating System: Ubuntu 24.04 LTS OSType: linux Architecture: aarch64 CPUs: 2 Total Memory: 1.91GiB Name: colima ID: a09eda6a-75aa-4810-960d-0718469dc07d Docker Root Dir: /var/lib/docker Debug Mode: false Username: $USER Experimental: false Insecure Registries: Live Restore Enabled: false

crobby commented 3 months ago

fwiw, I'm seeing this same issue starting today. It was working correctly 2 days ago.

crobby commented 3 months ago

fwiw, I'm seeing this same issue starting today. It was working correctly 2 days ago.

In my case, this was solved by disconnecting from my VPN. The docker container logs pointed me toward a networking issue, which it seems to be for me.

shanilhirani commented 3 months ago

Yeah, it's not a network issue for me as I've experienced this on two devices. Just simply rolling back seems to work without other changes so it's difficult to work out this issue.

I've had a look at changing the k3s rancher image to see if this helps but no changes in behaviour.

K3d 5.6.3 seems to something strange about how it's mapping DNS in the container.

adriaanm commented 3 months ago

Same here. 5.6.0 works but 5.6.2 does not (nor does 5.6.3). It seems to be using the wrong nameserver in /etc/resolv.conf inside the k3d container:

❯ k3d --version                                                                                                                                           ✘ 1 
k3d version v5.6.2
k3s version v1.28.8-k3s1 (default)

~/g/sandbox dev*
❯ colima ssh
me@colima:/Users/me/g/sandbox$ docker exec -it k3d-local-server-0 sh
/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

search fritz.box
options ndots:0

# Based on host file: '/run/systemd/resolve/resolv.conf' (internal resolver)
# ExtServers: []
# Overrides: []
# Option ndots from: internal
/ # nslookup google.com
;; connection timed out; no servers could be reached

Rolling back to 5.6.0 (note how the nameserver is rewritten to

❯ k3d --version
k3d version v5.6.0
k3s version v1.27.4-k3s1 (default)

❯ colima ssh
me@colima:/Users/me/g/sandbox$ docker exec -it k3d-local-server-0 sh
/ # cat /etc/resolv.conf
# Generated by Docker Engine.
# This file can be edited; Docker Engine will not make further changes once it
# has been modified.

search fritz.box
options ndots:0

# Based on host file: '/run/systemd/resolve/resolv.conf' (internal resolver)
# ExtServers: []
# Overrides: []
# Option ndots from: internal
/ # nslookup google.com

Non-authoritative answer:

Non-authoritative answer:
Name:   google.com
adriaanm commented 3 months ago

Fixed for me by disabling the dns fix when creating the cluster: K3D_FIX_DNS=0 k3d cluster create local

shanilhirani commented 3 months ago

Fixed for me by disabling the dns fix when creating the cluster: K3D_FIX_DNS=0 k3d cluster create local

@adriaanm - This workaround suggested seems to have worked.

kubectl get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   local-path-provisioner-6c86858495-kzdnd   1/1     Running     0          60s
kube-system   coredns-6799fbcd5-mcv7r                   1/1     Running     0          60s
kube-system   helm-install-traefik-crd-jgxlg            0/1     Completed   0          60s
kube-system   svclb-traefik-dc19675c-hm7d6              2/2     Running     0          35s
kube-system   helm-install-traefik-247hw                0/1     Completed   1          60s
kube-system   metrics-server-54fd9b65b-whff8            1/1     Running     0          60s
kube-system   traefik-f4564c4f4-k9b8v                   1/1     Running     0          35s

It would be good if was documented somewhere.

2fxprogeeme commented 3 months ago


have a look at #1445, that might describe the reason why the use of K3D_FIX_DNS=0 is a workaround for this problem.

nelyodev commented 1 week ago

I used this workaround K3D_FIX_DNS=0 but after stopping and restarting the cluster once it didn't seem to work anymore. Well, I didn't want to lose my experimental cluster so I dove into /etc/resolv.conf and found out that there is a wrong IP (it was the one of my colima vm). I just replaced it with my real nameserver. MacOS / colima here btw