Closed StickBrush closed 1 year ago
The stderr logs seem to have been removed from the issue for some reason. You can find them here: stderr.log
this is most probably a lack of resources in your environment, typically inotify watches, https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files
I had already tried that out. Sadly, it does not make a difference, I get the same error. According to stderr, it fails to load the CRI Socket Information, it finds it impossible as the node keeps returning a 404 and the Kubelet health endpoint refuses the connection. I think it might be related to some sort of timeout.
There are a lot of dimensions that you can exhaust resources on besides inotify, e.g. disk I/O.
kind isn't really optimized for this, what's your use case? Something like kubemark or kwok may be a better fit.
I'm a researcher, and this is essentially a test for a framework we have been building to orchestrate different applications in distributed systems. The framework takes different inputs to automatically generate the necessary deployments, services, pods, and other Kubernetes elements and applies the configuration to the cluster. One of the key parts of the evaluation is to measure the QoS that clients (that are also pods running under the same cluster) obtain from the applications and that actually requires the pods to be truly executing both the applications and the client software. That's also why I use the custom storage provider too, to gather all the results after each test and analyze them.
A suggestion way is try to create a cluster multiple times to understand the trigger point of the problem.
I solved the issue, it was just that the kubeadm error message didn't really match with the issue (which is an issue for kubeadm, not kind).
If you look at the kind YAML config I uploaded, you'll find that the kubeletExtraArgs
in the kubeadmConfigPatches
of the 11th node are system-reserved: memory=57Gi,cpu=31ç
. kubeadm couldn't really parse that ç, so the cluster couldn't start with more than 10 nodes.
I'm currently running a 25-node cluster without issues after fixing that, so I should be closing the issue. In any case, thank you all for your technical support!
What happened:
Kind crashed when I tried to create a "large" cluster for evaluating a research artifact, although there should not be issues in terms of capacity. The cluster has a total of 18 nodes (1 control plane node, 17 workers), limited so each one has 3 GB RAM and 1 CPU. I am running Kind in an AWS c3.8xlarge instance (32 CPUs, 60 GB RAM) with Ubuntu OS. Kind starts creating the cluster, but it fails on the "Joining worker nodes" step.
What you expected to happen:
I expected the cluster to be created normally.
How to reproduce it (as minimally and precisely as possible):
Create
error-conf.yaml
with the following content:Create
kind-pvc.yaml
with the following content:kind create cluster --config error-conf.yaml
Anything else we need to know?:
You can find the logs (both from stderr and Kind with the --retain flag) attached to the issue.
Environment:
kind version
): kind v0.20.0 go1.20.4 linux/amd64docker info
orpodman info
):Server: Containers: 18 Running: 18 Paused: 0 Stopped: 0 Images: 2 Server Version: 20.10.25 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: runc version: init version: Security Options: apparmor seccomp Profile: default cgroupns Kernel Version: 5.19.0-1025-aws Operating System: Ubuntu 22.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 32 Total Memory: 58.93GiB Name:
ID: X2S4:IQKZ:RMHB:LE42:EGFA:DJZU:FXC5:CSOB:24AQ:AZWE:NS5T:LEG3
Docker Root Dir: /mnt/docker-data
Debug Mode: false
stderr.log
Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: YAML configs.zip
127.0.0.0/8 Live Restore Enabled: false
PRETTY_NAME="Ubuntu 22.04.2 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.2 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-20T02:11:13Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"linux/amd64"} Kustomize Version: v5.0.1