kmesh-net / kmesh

High Performance ServiceMesh Data Plane Based on Programmable Kernel
https://kmesh.net
Apache License 2.0
424 stars 59 forks source link

Can't start kmesh using kind #689

Closed derekwin closed 1 month ago

derekwin commented 1 month ago

What happened:

  1. start kmesh with pulling image kubectl describe pod -n kmesh-system kmesh-rzbr2

QoS Class: Guaranteed Node-Selectors: Tolerations: :NoSchedule op=Exists :NoExecute op=Exists CriticalAddonsOnly op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message


Warning FailedScheduling 83s default-scheduler 0/2 nodes are available: 1 Insufficient memory. Normal Scheduled 82s default-scheduler Successfully assigned kmesh-system/kmesh-rzbr2 to c1-worker Normal Pulled 36s (x4 over 81s) kubelet Container image "ghcr.io/kmesh-net/kmesh:latest" already present on machine Normal Created 36s (x4 over 81s) kubelet Created container kmesh Normal Started 35s (x4 over 81s) kubelet Started container kmesh Warning BackOff 4s (x6 over 78s) kubelet Back-off restarting failed container kmesh in pod kmesh-rzbr2_kmesh-system(e5e79cb7-57e4-4d68-ba10-68160ca0f93c)

kubectl logs -f -n kmesh-system kmesh-rzbr2

time="2024-08-05T12:41:17Z" level=info msg="FLAG: --bpf-fs-path=\"/sys/fs/bpf\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --cgroup2-path=\"/mnt/kmesh_cgroup2\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --cni-etc-path=\"/etc/cni/net.d\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --conflist-name=\"\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --enable-bpf-log=\"true\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --enable-bypass=\"false\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --enable-mda=\"false\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --enable-secret-manager=\"false\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --help=\"false\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --mode=\"workload\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="FLAG: --plugin-cni-chained=\"true\"" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="kmesh start with Normal" subsys=pkg/bpf time="2024-08-05T12:41:17Z" level=info msg="bpf Start successful" subsys=manager time="2024-08-05T12:41:17Z" level=info msg="start kmesh manage controller successfully" subsys=controller time="2024-08-05T12:41:17Z" level=info msg="service node ztunnel~10.244.1.4~kmesh-rzbr2.kmesh-system~kmesh-system.svc.cluster.local connect to discovery address istiod.istio-system.svc:15012" subsys=controller/config time="2024-08-05T12:41:17Z" level=info msg="Clean kmesh_version map and bpf prog" subsys=pkg/bpf time="2024-08-05T12:41:17Z" level=error msg="create client and stream failed, create workload stream failed, DeltaAggregatedResources failed, rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.96.61.9:15012: connect: connection refused\"" subsys=main Error: create client and stream failed, create workload stream failed, DeltaAggregatedResources failed, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.96.61.9:15012: connect: connection refused" kmesh exit


2. start kmesh with building docker
make docker HUB=ghcr.io/kmesh-net TARGET=kmesh TAG=latest

kubectl describe pod -n kmesh-system kmesh-nr9rc

QoS Class: Guaranteed Node-Selectors: Tolerations: :NoSchedule op=Exists :NoExecute op=Exists CriticalAddonsOnly op=Exists node.kubernetes.io/disk-pressure:NoSchedule op=Exists node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists node.kubernetes.io/pid-pressure:NoSchedule op=Exists node.kubernetes.io/unreachable:NoExecute op=Exists node.kubernetes.io/unschedulable:NoSchedule op=Exists Events: Type Reason Age From Message


Warning FailedScheduling 40s default-scheduler 0/2 nodes are available: 1 Insufficient memory. Normal Scheduled 38s default-scheduler Successfully assigned kmesh-system/kmesh-nr9rc to c1-worker Normal Pulled 20s (x3 over 37s) kubelet Container image "ghcr.io/kmesh-net/kmesh:latest" already present on machine Normal Created 20s (x3 over 37s) kubelet Created container kmesh Normal Started 19s (x3 over 37s) kubelet Started container kmesh Warning BackOff 6s (x3 over 34s) kubelet Back-off restarting failed container kmesh in pod kmesh-nr9rc_kmesh-system(2eb84fc3-b6f3-4eb6-8b36-df09efdad58f)

kubectl logs -f -n kmesh-system kmesh-nr9rc

mkdir: cannot create directory '/mnt/kmesh_cgroup2': File exists kmesh exit


**Environment**:
kernel version

Linux localhost.localdomain 6.4.0-10.1.0.20.oe2309.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 25 19:01:14 CST 2023 x86_64 x86_64 x86_64 GNU/Linux

os version

NAME="openEuler" VERSION="23.09" ID="openEuler" VERSION_ID="23.09" PRETTY_NAME="openEuler 23.09" ANSI_COLOR="0;31"

docker version

Client: Docker Engine - Community Version: 26.1.4 API version: 1.43 (downgraded from 1.45) Go version: go1.21.11 Git commit: 5650f9b Built: Wed Jun 5 11:32:04 2024 OS/Arch: linux/amd64 Context: default

Server: Docker Engine - Community Engine: Version: 24.0.5 API version: 1.43 (minimum version 1.12) Go version: go1.20.6 Git commit: a61e2b4 Built: Fri Jul 21 20:38:05 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.33 GitCommit: d2d58213f83a351ca8f528a95fbd145f5654e957 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0

kind version

kind v0.19.0 go1.20.4 linux/amd64

istioctl version: 1.22.1

cluster info:

NAME STATUS ROLES AGE VERSION c1-control-plane Ready control-plane 54m v1.28.0 c1-worker Ready 54m v1.28.0

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES istio-system istio-cni-node-6l7l2 1/1 Running 0 54m 172.18.0.3 c1-worker istio-system istio-cni-node-prms9 1/1 Running 0 54m 172.18.0.2 c1-control-plane istio-system istiod-7df4b86f44-njrkw 0/1 Pending 0 30m istio-system ztunnel-b9swg 1/1 Running 0 54m 10.244.1.3 c1-worker istio-system ztunnel-qzfml 1/1 Running 0 54m 10.244.0.5 c1-control-plane kmesh-system kmesh-mfnmf 0/1 CrashLoopBackOff 10 (3m56s ago) 30m 10.244.0.7 c1-control-plane kmesh-system kmesh-nr9rc 0/1 CrashLoopBackOff 10 (4m13s ago) 30m 10.244.1.6 c1-worker kube-system coredns-5dd5756b68-mzcbh 1/1 Running 0 55m 10.244.0.3 c1-control-plane kube-system coredns-5dd5756b68-n9h7w 1/1 Running 0 55m 10.244.0.2 c1-control-plane kube-system etcd-c1-control-plane 1/1 Running 0 55m 172.18.0.2 c1-control-plane kube-system kindnet-528kg 1/1 Running 0 55m 172.18.0.3 c1-worker kube-system kindnet-x4dkv 1/1 Running 0 55m 172.18.0.2 c1-control-plane kube-system kube-apiserver-c1-control-plane 1/1 Running 0 55m 172.18.0.2 c1-control-plane kube-system kube-controller-manager-c1-control-plane 1/1 Running 0 55m 172.18.0.2 c1-control-plane kube-system kube-proxy-ghkm4 1/1 Running 0 55m 172.18.0.3 c1-worker kube-system kube-proxy-kfvkp 1/1 Running 0 55m 172.18.0.2 c1-control-plane kube-system kube-scheduler-c1-control-plane 1/1 Running 0 55m 172.18.0.2 c1-control-plane local-path-storage local-path-provisioner-6f8956fb48-g22w7 1/1 Running 0 55m 10.244.0.4 c1-control-plane

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR default kubernetes ClusterIP 10.96.0.1 443/TCP 55m istio-system istiod ClusterIP 10.96.61.9 15010/TCP,15012/TCP,443/TCP,15014/TCP 54m app=istiod,istio=pilot kube-system kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 55m k8s-app=kube-dns

helm install kmesh

NAME: kmesh LAST DEPLOYED: Mon Aug 5 20:59:40 2024 NAMESPACE: kmesh-system STATUS: deployed REVISION: 1 TEST SUITE: None NAME READY STATUS RESTARTS AGE kmesh-mfnmf 0/1 ContainerCreating 0 1s kmesh-nr9rc 0/1 Pending 0 1s

Okabe-Rintarou-0 commented 1 month ago

0/2 nodes are available: 1 Insufficient memory.

failed to schedule

hzxuzhonghu commented 1 month ago

dial tcp 10.96.61.9:15012: connect: connection refused"

It means the istiod is not ready yet.

hzxuzhonghu commented 1 month ago

istio-system istiod-7df4b86f44-njrkw 0/1 Pending 0 30m

derekwin commented 1 month ago

The cause of the issue was that the host machine's memory (4GB) was not enough, leading to the pod shutting down. After increasing the host machine's memory to 8GB, the Istio mesh started up properly.