Cannot create fairly large clusters

StickBrush commented 1 year ago

What happened:

Kind crashed when I tried to create a "large" cluster for evaluating a research artifact, although there should not be issues in terms of capacity. The cluster has a total of 18 nodes (1 control plane node, 17 workers), limited so each one has 3 GB RAM and 1 CPU. I am running Kind in an AWS c3.8xlarge instance (32 CPUs, 60 GB RAM) with Ubuntu OS. Kind starts creating the cluster, but it fails on the "Joining worker nodes" step.

What you expected to happen:

I expected the cluster to be created normally.

How to reproduce it (as minimally and precisely as possible):

Create error-conf.yaml with the following content:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=6
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31ç
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31
- role: worker
extraMounts:
  - hostPath: ./kind-pvc.yaml
    containerPath: /kind/manifests/default-storage.yaml
  - hostPath: /tmp/hostpath-provisioner
    containerPath: /tmp/hostpath-provisioner
kubeadmConfigPatches: # /!\ ADAPT THIS CONFIGURATION TO YOUR DEVICE /!\ This controls the capacity of the nodes. However, it is very unintuitive, as system-reserved resources are not the node's resources, they are SUBSTRACTED from your device to get the node's resources.
  - |
    kind: JoinConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        system-reserved: memory=57Gi,cpu=31

Create kind-pvc.yaml with the following content:

---
apiVersion: v1
kind: Namespace
metadata:
  name: local-storage
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: hostpath-provisioner
  namespace: local-storage
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: hostpath-provisioner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: hostpath-provisioner
subjects:
  - kind: ServiceAccount
    name: hostpath-provisioner
    namespace: local-storage
roleRef:
  kind: ClusterRole
  name: hostpath-provisioner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-hostpath-provisioner
  namespace: local-storage
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "update", "patch"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["list", "watch", "create"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-hostpath-provisioner
  namespace: local-storage
subjects:
  - kind: ServiceAccount
    name: hostpath-provisioner
    namespace: local-storage
roleRef:
  kind: Role
  name: leader-locking-hostpath-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hostpath-provisioner
  namespace: local-storage
  labels:
        app: hostpath-provisioner
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hostpath-provisioner
  template:
    metadata:
      labels:
        app: hostpath-provisioner
    spec:
      containers:
        - name: hostpath-provisioner
          image: mauilion/hostpath-provisioner:dev
          imagePullPolicy: "IfNotPresent"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: pv-volume
              mountPath: /tmp/hostpath-provisioner
      serviceAccountName: hostpath-provisioner
      volumes:
      - name: pv-volume
        hostPath:
          path: /tmp/hostpath-provisioner
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
reclaimPolicy: Retain
provisioner: example.com/hostpath

kind create cluster --config error-conf.yaml

Anything else we need to know?:

You can find the logs (both from stderr and Kind with the --retain flag) attached to the issue.

Environment:

kind version: (use kind version): kind v0.20.0 go1.20.4 linux/amd64

Runtime info: (use docker info or podman info):


Client:
Context:    default
Debug Mode: false

Server: Containers: 18 Running: 18 Paused: 0 Stopped: 0 Images: 2 Server Version: 20.10.25 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default cgroupns Kernel Version: 5.19.0-1025-aws Operating System: Ubuntu 22.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 32 Total Memory: 58.93GiB Name: ID: X2S4:IQKZ:RMHB:LE42:EGFA:DJZU:FXC5:CSOB:24AQ:AZWE:NS5T:LEG3 Docker Root Dir: /mnt/docker-data Debug Mode: false stderr.log

Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: YAML configs.zip

127.0.0.0/8 Live Restore Enabled: false

- OS (e.g. from `/etc/os-release`):

PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

- Kubernetes version: (use `kubectl version`):

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-20T02:11:13Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v5.0.1


- Any proxies or other special environment settings?: The Docker images directory has been changed to `/mnt/docker-data`
[KindErrorLogs.zip](https://github.com/kubernetes-sigs/kind/files/12366508/KindErrorLogs.zip)

StickBrush commented 1 year ago

The stderr logs seem to have been removed from the issue for some reason. You can find them here: stderr.log

aojea commented 1 year ago

this is most probably a lack of resources in your environment, typically inotify watches, https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files

StickBrush commented 1 year ago

I had already tried that out. Sadly, it does not make a difference, I get the same error. According to stderr, it fails to load the CRI Socket Information, it finds it impossible as the node keeps returning a 404 and the Kubelet health endpoint refuses the connection. I think it might be related to some sort of timeout.

BenTheElder commented 1 year ago

There are a lot of dimensions that you can exhaust resources on besides inotify, e.g. disk I/O.

kind isn't really optimized for this, what's your use case? Something like kubemark or kwok may be a better fit.

StickBrush commented 1 year ago

I'm a researcher, and this is essentially a test for a framework we have been building to orchestrate different applications in distributed systems. The framework takes different inputs to automatically generate the necessary deployments, services, pods, and other Kubernetes elements and applies the configuration to the cluster. One of the key parts of the evaluation is to measure the QoS that clients (that are also pods running under the same cluster) obtain from the applications and that actually requires the pods to be truly executing both the applications and the client software. That's also why I use the custom storage provider too, to gather all the results after each test and analyze them.

liangyuanpeng commented 1 year ago

A suggestion way is try to create a cluster multiple times to understand the trigger point of the problem.

10 node by default
18 node by default
10 node with storage
18 node with storage

StickBrush commented 1 year ago

I solved the issue, it was just that the kubeadm error message didn't really match with the issue (which is an issue for kubeadm, not kind).

If you look at the kind YAML config I uploaded, you'll find that the kubeletExtraArgs in the kubeadmConfigPatches of the 11th node are system-reserved: memory=57Gi,cpu=31ç. kubeadm couldn't really parse that ç, so the cluster couldn't start with more than 10 nodes.

I'm currently running a 25-node cluster without issues after fixing that, so I should be closing the issue. In any case, thank you all for your technical support!

kubernetes-sigs / kind

Cannot create fairly large clusters #3330