kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.5k stars 1.56k forks source link

Can't create second kind cluster #3712

Open vie-serendipity opened 3 months ago

vie-serendipity commented 3 months ago

I tried to create two clusters using kind. But it seems something unexpected happen. I I've looked up some of the current issues, but they don't seem to be quite the same as mine.

kind create cluster
kind create cluster --name="test"

The first cluster works fine and the second encounter some problems.

Error Log

The following is the log of creating second cluster. Acutally it seems everything is fine. But I don't know why apiserver keeps not ready.

 ✗ Starting control-plane 🕹️ 
Deleted nodes: ["test-control-plane"]
ERROR: failed to create cluster: failed to init node with kubeadm: command "podman exec --privileged test-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 1
Command Output: I0816 02:45:01.311408     201 initconfiguration.go:255] loading configuration from "/kind/kubeadm.conf"
W0816 02:45:01.313867     201 initconfiguration.go:332] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
[init] Using Kubernetes version: v1.27.3
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0816 02:45:01.320428     201 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0816 02:45:01.681284     201 certs.go:519] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local localhost test-control-plane] and IPs [10.96.0.1 10.89.0.31 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0816 02:45:01.983470     201 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0816 02:45:02.207187     201 certs.go:519] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0816 02:45:02.302801     201 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0816 02:45:02.397571     201 certs.go:519] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost test-control-plane] and IPs [10.89.0.31 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost test-control-plane] and IPs [10.89.0.31 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0816 02:45:03.260376     201 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0816 02:45:03.441895     201 kubeconfig.go:103] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0816 02:45:03.526145     201 kubeconfig.go:103] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0816 02:45:03.695509     201 kubeconfig.go:103] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0816 02:45:04.009275     201 kubeconfig.go:103] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
I0816 02:45:04.121069     201 kubelet.go:67] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0816 02:45:04.220357     201 manifests.go:99] [control-plane] getting StaticPodSpecs
I0816 02:45:04.220698     201 certs.go:519] validating certificate period for CA certificate
I0816 02:45:04.220803     201 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0816 02:45:04.220814     201 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0816 02:45:04.220823     201 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0816 02:45:04.220831     201 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0816 02:45:04.220839     201 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0816 02:45:04.226167     201 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0816 02:45:04.226196     201 manifests.go:99] [control-plane] getting StaticPodSpecs
I0816 02:45:04.226550     201 manifests.go:125] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0816 02:45:04.226563     201 manifests.go:125] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0816 02:45:04.226570     201 manifests.go:125] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0816 02:45:04.226577     201 manifests.go:125] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0816 02:45:04.226584     201 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0816 02:45:04.226591     201 manifests.go:125] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0816 02:45:04.226599     201 manifests.go:125] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0816 02:45:04.227539     201 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0816 02:45:04.227562     201 manifests.go:99] [control-plane] getting StaticPodSpecs
I0816 02:45:04.227828     201 manifests.go:125] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0816 02:45:04.228381     201 manifests.go:154] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0816 02:45:04.229173     201 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0816 02:45:04.229191     201 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
I0816 02:45:04.229886     201 loader.go:373] Config loaded from file:  /etc/kubernetes/admin.conf
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
I0816 02:45:04.233791     201 round_trippers.go:553] GET https://test-control-plane:6443/healthz?timeout=10s  in 2 milliseconds
I0816 02:45:04.734764     201 round_trippers.go:553] GET https://test-control-plane:6443/healthz?timeout=10s  in 0 milliseconds

local enviroment

Memory 16Gi kind version: 0.22.0 uname -r: 5.10.134-16.3.al8.x86_64 docker version

Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Client:       Podman Engine
Version:      4.9.4-rhel
API Version:  4.9.4-rhel
Go Version:   go1.21.9 (Red Hat 1.21.9-1.0.1.al8)
Built:        Mon Jun 24 10:57:37 2024
OS/Arch:      linux/amd64
BenTheElder commented 3 months ago

We need the kind export logs output but please check

https://kind.sigs.k8s.io/docs/user/known-issues/

this is almost always a resource exhaustion issue with the host such as https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

Which you can tell by it only being additional clusters that don't work. Clusters don't otherwise interact.

vie-serendipity commented 3 months ago

I adjust inotify resource setting, but it does'n work. The following are files of kind export logs. These files are about the first kind cluster, I didn't find info about the second one. kubernetes-version.txt serial.log kind-version.txt podman-info.txt alternatives.log containerd.log images.log inspect.json journal.log kubelet.log kube-apiserver-kind-control-plane_kube-system_kube-apiserver.log

aojea commented 3 months ago

The following are files of kind export logs. These files are about the first kind cluster, I didn't find info about the second one.

the logs of the working cluster are not going to help much, you need to run the second cluster with the flag --retain so you can get the logs from it

stmcginnis commented 3 months ago

The 4.x kernel raises some concern. Is this an older release of Amazon Linux?

The output from podman info could also have a lot of useful details.

BenTheElder commented 3 months ago

The following are files of kind export logs. These files are about the first kind cluster, I didn't find info about the second one.

that command is per cluster, it takes a --name flag (most do)

vie-serendipity commented 3 months ago

@stmcginnis It's an instance from another cloud provider.

The following is the output of podman info:

host:
  arch: amd64
  buildahVersion: 1.33.8
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - ioasids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.10-1.al8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: 4e7fcd323c640c05f88ca82c36a94c971aee0c4c'
  cpuUtilization:
    idlePercent: 94.71
    systemPercent: 1.9
    userPercent: 3.39
  cpus: 4
  databaseBackend: sqlite
  distribution:
    distribution: alinux
    version: "3"
  eventLogger: file
  freeLocks: 2044
  hostname: iZbp1f4a36z9etbvhabv0uZ
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.134-16.3.al8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 4750659584
  memTotal: 16222420992
  networkBackend: cni
  networkBackendInfo:
    backend: cni
    dns:
      package: podman-plugins-4.9.4-3.0.1.al8.x86_64
      path: /usr/libexec/cni/dnsname
      version: |-
        CNI dnsname plugin
        version: 1.4.0-dev
        commit: unknown
        CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
    package: containernetworking-plugins-1.4.0-2.0.1.al8.x86_64
    path: /usr/libexec/cni
  ociRuntime:
    name: runc
    package: runc-1.1.12-1.0.1.al8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.12
      spec: 1.0.2-dev
      go: go1.20.12
      libseccomp: 2.5.2
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.3-1.al8.x86_64
    version: |-
      slirp4netns version 1.2.3
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 68h 26m 51.00s (Approximately 2.83 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 2
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 41881894912
  graphRootUsed: 11787603968
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 26
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.9.4-rhel
  Built: 1719197857
  BuiltTime: Mon Jun 24 10:57:37 2024
  GitCommit: ""
  GoVersion: go1.21.9 (Red Hat 1.21.9-1.0.1.al8)
  Os: linux
  OsArch: linux/amd64
  Version: 4.9.4-rhel
vie-serendipity commented 3 months ago

@BenTheElder Thanks.

vie-serendipity commented 3 months ago

Because kubelet.log is too big. I only attach the start and end part. Start 20 lines

Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.126758 3025491 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.127228 3025491 server.go:837] "Client rotation is on, will bootstrap in background"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.129878 3025491 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130921 3025491 container_manager_linux.go:822] "CPUAccounting not enabled for process" pid=3025491
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130934 3025491 container_manager_linux.go:825] "MemoryAccounting not enabled for process" pid=3025491
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130938 3025491 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134317 3025491 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[kubelet]
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134358 3025491 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName:/system.slice/containerd.service SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/kubelet CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134382 3025491 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134393 3025491 container_manager_linux.go:302] "Creating device plugin manager"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134422 3025491 state_mem.go:36] "Initialized new in-memory state store"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137015 3025491 kubelet.go:405] "Attempting to sync node with API server"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137036 3025491 kubelet.go:298] "Adding static pod path" path="/etc/kubernetes/manifests"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137062 3025491 kubelet.go:309] "Adding apiserver pod source"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137082 3025491 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137829 3025491 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.1" apiVersion="v1"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: W0817 23:07:21.138195 3025491 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://test-control-plane:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 17 23:07:21 test-control-plane kubelet[3025491]: E0817 23:07:21.138253 3025491 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://test-control-plane:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.138281 3025491 server.go:1168] "Started kubelet"

Endding 20 lines

Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769639 3319819 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769671 3319819 status_manager.go:207] "Starting to sync pod status with apiserver"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769701 3319819 kubelet.go:2257] "Starting kubelet main sync loop"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.769776 3319819 kubelet.go:2281] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: W0818 02:44:21.772576 3319819 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://test-control-plane:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.772673 3319819 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://test-control-plane:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.842358 3319819 kubelet_node_status.go:70] "Attempting to register node" node="test-control-plane"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.843107 3319819 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://test-control-plane:6443/api/v1/nodes\": dial tcp 10.89.0.35:6443: connect: connection refused" node="test-control-plane"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853126 3319819 cpu_manager.go:214] "Starting CPU manager" policy="none"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853150 3319819 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853171 3319819 state_mem.go:36] "Initialized new in-memory state store"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853360 3319819 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853377 3319819 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853386 3319819 policy_none.go:49] "None policy: Start"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853990 3319819 memory_manager.go:169] "Starting memorymanager" policy="None"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.854016 3319819 state_mem.go:35] "Initializing new in-memory state store"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.854196 3319819 state_mem.go:75] "Updated machine memory state"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.861255 3319819 kubelet.go:1480] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubelet kubepods] doesn't exist"
Aug 18 02:44:21 test-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Aug 18 02:44:21 test-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.
aojea commented 2 months ago

it looks like this https://github.com/kubernetes/kubernetes/issues/122955

vie-serendipity commented 2 months ago

@aojea Thanks, I'll go take a look.