Closed rlnrln closed 7 years ago
The last "connection refused" is:
Feb 28 14:21:53 ip-10-128-171-159 kubelet[735]: E0228 14:21:53.111447 735 docker_manager.go:2201] Failed to setup network for pod "m1-stage-2548681241-b1r9w_default(de7bdd6e-fdc0-11e6-bc48-02ecf0186f8f)" using network plugins "cni": unable to allocate IP address: Post http://127.0.0.1:6784/ip/20e61a9927959f0cfe54e64fba57ab528eae69748bbcaa30afe58bd38bed2679: dial tcp 127.0.0.1:6784: getsockopt: connection refused; Skipping pod
and the weave-kube container starts right after that:
INFO: 2017/02/28 14:21:53.110489 Command line options: map[status-addr:0.0.0.0:6782 http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 name:4e:79:89:47:38:6f nickname:ip-10-128-171-159 no-dns:true port:6783 docker-api: datapath:datapath ipalloc-init:consensus=4]
To clarify, the issue is that the pod sits in "ContainerCreating" state, and I cannot see anything in those logs to explain why. I'm going to edit the issue description as "node can't connect to 127.0.0.1:6784" is just a transient state before it all gets going
After much investigating via Slack, the problem was traced here:
# curl 'http://127.0.0.1:6784/status'
Version: 1.8.2 (version 1.9.1 available - please upgrade!)
Service: router
Protocol: weave 1..2
Name: 66:2b:6a:ca:34:88(ip-10-128-152-185)
Encryption: disabled
PeerDiscovery: enabled
Targets: 4
Connections: 4 (3 established, 1 failed)
Peers: 4 (with 12 established connections)
TrustedSubnets: none
Service: ipam
Status: waiting for IP range grant from peers
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12
The "waiting for IP range grant from peers" status indicates that Weave Net's IPAM believes that all the IP address space is owned by other nodes in the cluster, but actually none of those nodes are able to be contacted at the moment.
So the underlying cause is #2797
Thank you, Bryan, for all the assistance in tracking down this issue yesterday.
Here's my workaround. Big red warnings:
% for i in $(curl -s 'http://127.0.0.1:6784/status/ipam' | grep 'unreachable\!$' | sort -k2 -n -r | awk -F'(' '{print $2}' | sed 's/).*//'); do echo curl -X DELETE 127.0.0.1:6784/peer/$i; done
65536 IPs taken over from ip-10-128-184-15
32768 IPs taken over from ip-10-128-159-154
32768 IPs taken over from ip-10-128-170-84
...
Thanks, @rlnrln. Going to close this as a duplicate.
Please help me , i am trying to use weave networking solution my k8 cluster. but i am getting below errors. See below PODS are not running. (This page i referring for K8 setup but stuck at POD networking step) https://github.com/mmumshad/kubernetes-the-hard-way/blob/master/README.md
1. vagrant@master-1:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-69cbb76ff8-nql5k 0/1 ContainerCreating 0 17h coredns-69cbb76ff8-p44xg 0/1 ContainerCreating 0 17h weave-net-2vdr7 1/2 CrashLoopBackOff 22 17h weave-net-pqrx8 1/2 CrashLoopBackOff 23 17h vagrant@master-1:~$
============================
===============
see my kubelet logs in worker-1 logs
kubelet.service - Kubernetes Kubelet Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled) Active: active (running) since Sun 2022-01-09 11:20:30 UTC; 17h ago Docs: https://github.com/kubernetes/kubernetes Main PID: 17379 (kubelet) Tasks: 13 (limit: 2360) CGroup: /system.slice/kubelet.service └─17379 /usr/local/bin/kubelet --config=/var/lib/kubelet/kubelet-config.yaml --image-pull-progress-deadline=2m --kubeconfig=/var/lib
Jan 10 04:35:53 worker-1 kubelet[17379]: E0110 04:35:53.787040 17379 pod_workers.go:190] Error syncing pod ed5aa617-713c-11ec-9aa3-0229e0c3be Jan 10 04:36:05 worker-1 kubelet[17379]: E0110 04:36:05.613166 17379 pod_workers.go:190] Error syncing pod 9acb983f-713c-11ec-9aa3-0229e0c3be Jan 10 04:36:08 worker-1 kubelet[17379]: I0110 04:36:08.687387 17379 kuberuntime_manager.go:415] No ready sandbox for pod "coredns-69cbb76ff8 Jan 10 04:36:08 worker-1 kubelet[17379]: W0110 04:36:08.691641 17379 cni.go:302] CNI failed to retrieve network namespace path: cannot find n Jan 10 04:36:08 worker-1 kubelet[17379]: weave-cni: unable to release IP address: Delete "http://127.0.0.1:6784/ip/b1858edb33e7a3f91fdc10f72a42 Jan 10 04:36:08 worker-1 kubelet[17379]: E0110 04:36:08.722868 17379 cni.go:345] Error deleting kube-system_coredns-69cbb76ff8-nql5k/b1858edb Jan 10 04:36:08 worker-1 kubelet[17379]: E0110 04:36:08.723621 17379 remote_runtime.go:119] StopPodSandbox "b1858edb33e7a3f91fdc10f72a42b18db Jan 10 04:36:08 worker-1 kubelet[17379]: E0110 04:36:08.723837 17379 kuberuntime_manager.go:815] Failed to stop sandbox {"docker" "b1858edb33 Jan 10 04:36:08 worker-1 kubelet[17379]: E0110 04:36:08.724008 17379 kuberuntime_manager.go:610] killPodWithSyncResult failed: failed to "Kil Jan 10 04:36:08 worker-1 kubelet[17379]: E0110 04:36:08.724147 17379 pod_workers.go:190] Error syncing pod ed5aa617-713c-11ec-9aa3-0229e0c3be
==============================
vagrant@master-1:~$ kubectl describe pod weave-net-2vdr7 -n kube-system
Name: weave-net-2vdr7
Namespace: kube-system
Priority: 2000001000
PriorityClassName: system-node-critical
Node: worker-1/192.168.5.21
Start Time: Sun, 09 Jan 2022 11:09:30 +0000
Labels: controller-revision-hash=6fd65954f8
name=weave-net
pod-template-generation=3
Annotations:
cni-bin:
Type: HostPath (bare host directory volume)
Path: /opt
HostPathType:
cni-bin2:
Type: HostPath (bare host directory volume)
Path: /home
HostPathType:
cni-conf:
Type: HostPath (bare host directory volume)
Path: /etc
HostPathType:
dbus:
Type: HostPath (bare host directory volume)
Path: /var/lib/dbus
HostPathType:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
machine-id:
Type: HostPath (bare host directory volume)
Path: /etc/machine-id
HostPathType: FileOrCreate
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
weave-net-token-68jgk:
Type: Secret (a volume populated by a Secret)
SecretName: weave-net-token-68jgk
Optional: false
QoS Class: Burstable
Node-Selectors:
Normal Scheduled 17h default-scheduler Successfully assigned kube-system/weave-net-2vdr7 to worker-1 Normal Pulled 17h kubelet, worker-1 Container image "ghcr.io/weaveworks/launcher/weave-kube:2.8.1" already present on machine Normal Created 17h kubelet, worker-1 Created container Normal Started 17h kubelet, worker-1 Started container Normal Pulled 17h kubelet, worker-1 Container image "ghcr.io/weaveworks/launcher/weave-npc:2.8.1" already present on machine Normal Started 17h kubelet, worker-1 Started container Normal Created 17h kubelet, worker-1 Created container Normal Pulled 17h (x3 over 17h) kubelet, worker-1 Container image "ghcr.io/weaveworks/launcher/weave-kube:2.8.1" already present on machine Normal Started 17h (x3 over 17h) kubelet, worker-1 Started container Normal Created 17h (x3 over 17h) kubelet, worker-1 Created container Warning Unhealthy 17h (x9 over 17h) kubelet, worker-1 Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refused Warning BackOff 17h (x30 over 17h) kubelet, worker-1 Back-off restarting failed container Normal Created 17h (x3 over 17h) kubelet, worker-1 Created container Normal Started 17h (x3 over 17h) kubelet, worker-1 Started container Normal Pulled 16h (x9 over 17h) kubelet, worker-1 Container image "ghcr.io/weaveworks/launcher/weave-kube:2.8.1" already present on machine Warning BackOff 16h (x158 over 17h) kubelet, worker-1 Back-off restarting failed container Warning Unhealthy 112s (x43 over 17h) kubelet, worker-1 Readiness probe failed: Get http://127.0.0.1:6784/status: dial tcp 127.0.0.1:6784: connect: connection refused
I dont understand what else to check . Anybody please help me. i want to set up k8 cluster i my laptop first so i will practise and will clear CKA exam.
Kubernetes 1.5.2, created with Kops 1.5.0 alpha 4.
Complete logs from recently rebooted node.
kubelet logs
kube-weave logs
Scheduling deployment of prometheus: