Open vie-serendipity opened 3 months ago
We need the kind export logs
output but please check
https://kind.sigs.k8s.io/docs/user/known-issues/
this is almost always a resource exhaustion issue with the host such as https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files
Which you can tell by it only being additional clusters that don't work. Clusters don't otherwise interact.
I adjust inotify resource setting, but it does'n work.
The following are files of kind export logs
. These files are about the first kind cluster, I didn't find info about the second one.
kubernetes-version.txt
serial.log
kind-version.txt
podman-info.txt
alternatives.log
containerd.log
images.log
inspect.json
journal.log
kubelet.log
kube-apiserver-kind-control-plane_kube-system_kube-apiserver.log
The following are files of
kind export logs
. These files are about the first kind cluster, I didn't find info about the second one.
the logs of the working cluster are not going to help much, you need to run the second cluster with the flag --retain
so you can get the logs from it
The 4.x kernel raises some concern. Is this an older release of Amazon Linux?
The output from podman info
could also have a lot of useful details.
The following are files of kind export logs. These files are about the first kind cluster, I didn't find info about the second one.
that command is per cluster, it takes a --name flag (most do)
@stmcginnis It's an instance from another cloud provider.
The following is the output of podman info
:
host:
arch: amd64
buildahVersion: 1.33.8
cgroupControllers:
- cpuset
- cpu
- cpuacct
- blkio
- memory
- devices
- freezer
- net_cls
- perf_event
- net_prio
- hugetlb
- pids
- ioasids
- rdma
cgroupManager: systemd
cgroupVersion: v1
conmon:
package: conmon-2.1.10-1.al8.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.10, commit: 4e7fcd323c640c05f88ca82c36a94c971aee0c4c'
cpuUtilization:
idlePercent: 94.71
systemPercent: 1.9
userPercent: 3.39
cpus: 4
databaseBackend: sqlite
distribution:
distribution: alinux
version: "3"
eventLogger: file
freeLocks: 2044
hostname: iZbp1f4a36z9etbvhabv0uZ
idMappings:
gidmap: null
uidmap: null
kernel: 5.10.134-16.3.al8.x86_64
linkmode: dynamic
logDriver: k8s-file
memFree: 4750659584
memTotal: 16222420992
networkBackend: cni
networkBackendInfo:
backend: cni
dns:
package: podman-plugins-4.9.4-3.0.1.al8.x86_64
path: /usr/libexec/cni/dnsname
version: |-
CNI dnsname plugin
version: 1.4.0-dev
commit: unknown
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
package: containernetworking-plugins-1.4.0-2.0.1.al8.x86_64
path: /usr/libexec/cni
ociRuntime:
name: runc
package: runc-1.1.12-1.0.1.al8.x86_64
path: /usr/bin/runc
version: |-
runc version 1.1.12
spec: 1.0.2-dev
go: go1.20.12
libseccomp: 2.5.2
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
exists: false
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.2.3-1.al8.x86_64
version: |-
slirp4netns version 1.2.3
commit: c22fde291bb35b354e6ca44d13be181c76a0a432
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 0
swapTotal: 0
uptime: 68h 26m 51.00s (Approximately 2.83 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 2
paused: 0
running: 2
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 41881894912
graphRootUsed: 11787603968
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 26
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 4.9.4-rhel
Built: 1719197857
BuiltTime: Mon Jun 24 10:57:37 2024
GitCommit: ""
GoVersion: go1.21.9 (Red Hat 1.21.9-1.0.1.al8)
Os: linux
OsArch: linux/amd64
Version: 4.9.4-rhel
@BenTheElder Thanks.
Because kubelet.log is too big. I only attach the start and end part. Start 20 lines
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.126758 3025491 server.go:417] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.127228 3025491 server.go:837] "Client rotation is on, will bootstrap in background"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.129878 3025491 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130921 3025491 container_manager_linux.go:822] "CPUAccounting not enabled for process" pid=3025491
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130934 3025491 container_manager_linux.go:825] "MemoryAccounting not enabled for process" pid=3025491
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.130938 3025491 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134317 3025491 container_manager_linux.go:266] "Container manager verified user specified cgroup-root exists" cgroupRoot=[kubelet]
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134358 3025491 container_manager_linux.go:271] "Creating Container Manager object based on Node Config" nodeConfig={RuntimeCgroupsName:/system.slice/containerd.service SystemCgroupsName: KubeletCgroupsName: KubeletOOMScoreAdj:-999 ContainerRuntime: CgroupsPerQOS:true CgroupRoot:/kubelet CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[]} QOSReserved:map[] CPUManagerPolicy:none CPUManagerPolicyOptions:map[] TopologyManagerScope:container CPUManagerReconcilePeriod:10s ExperimentalMemoryManagerPolicy:None ExperimentalMemoryManagerReservedMemory:[] PodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms TopologyManagerPolicy:none ExperimentalTopologyManagerPolicyOptions:map[]}
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134382 3025491 topology_manager.go:136] "Creating topology manager with policy per scope" topologyPolicyName="none" topologyScopeName="container"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134393 3025491 container_manager_linux.go:302] "Creating device plugin manager"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.134422 3025491 state_mem.go:36] "Initialized new in-memory state store"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137015 3025491 kubelet.go:405] "Attempting to sync node with API server"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137036 3025491 kubelet.go:298] "Adding static pod path" path="/etc/kubernetes/manifests"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137062 3025491 kubelet.go:309] "Adding apiserver pod source"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137082 3025491 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.137829 3025491 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.1" apiVersion="v1"
Aug 17 23:07:21 test-control-plane kubelet[3025491]: W0817 23:07:21.138195 3025491 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://test-control-plane:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 17 23:07:21 test-control-plane kubelet[3025491]: E0817 23:07:21.138253 3025491 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://test-control-plane:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 17 23:07:21 test-control-plane kubelet[3025491]: I0817 23:07:21.138281 3025491 server.go:1168] "Started kubelet"
Endding 20 lines
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769639 3319819 kubelet_network_linux.go:63] "Initialized iptables rules." protocol=IPv6
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769671 3319819 status_manager.go:207] "Starting to sync pod status with apiserver"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.769701 3319819 kubelet.go:2257] "Starting kubelet main sync loop"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.769776 3319819 kubelet.go:2281] "Skipping pod synchronization" err="[container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: W0818 02:44:21.772576 3319819 reflector.go:533] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.RuntimeClass: Get "https://test-control-plane:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.772673 3319819 reflector.go:148] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://test-control-plane:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 10.89.0.35:6443: connect: connection refused
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.842358 3319819 kubelet_node_status.go:70] "Attempting to register node" node="test-control-plane"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.843107 3319819 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://test-control-plane:6443/api/v1/nodes\": dial tcp 10.89.0.35:6443: connect: connection refused" node="test-control-plane"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853126 3319819 cpu_manager.go:214] "Starting CPU manager" policy="none"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853150 3319819 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853171 3319819 state_mem.go:36] "Initialized new in-memory state store"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853360 3319819 state_mem.go:88] "Updated default CPUSet" cpuSet=""
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853377 3319819 state_mem.go:96] "Updated CPUSet assignments" assignments=map[]
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853386 3319819 policy_none.go:49] "None policy: Start"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.853990 3319819 memory_manager.go:169] "Starting memorymanager" policy="None"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.854016 3319819 state_mem.go:35] "Initializing new in-memory state store"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: I0818 02:44:21.854196 3319819 state_mem.go:75] "Updated machine memory state"
Aug 18 02:44:21 test-control-plane kubelet[3319819]: E0818 02:44:21.861255 3319819 kubelet.go:1480] "Failed to start ContainerManager" err="failed to initialize top level QOS containers: root container [kubelet kubepods] doesn't exist"
Aug 18 02:44:21 test-control-plane systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Aug 18 02:44:21 test-control-plane systemd[1]: kubelet.service: Failed with result 'exit-code'.
it looks like this https://github.com/kubernetes/kubernetes/issues/122955
@aojea Thanks, I'll go take a look.
I tried to create two clusters using kind. But it seems something unexpected happen. I I've looked up some of the current issues, but they don't seem to be quite the same as mine.
The first cluster works fine and the second encounter some problems.
Error Log
The following is the log of creating second cluster. Acutally it seems everything is fine. But I don't know why apiserver keeps not ready.
local enviroment
Memory 16Gi kind version: 0.22.0 uname -r: 5.10.134-16.3.al8.x86_64 docker version