Closed edoziw closed 1 year ago
update, It seems the master node is running (I can kubectl in its ssh session, and curl the kube-vip ip for the ${apiserver_endpoint}, its just that the kube-vip arp broadcast is failing to get to:
I'll continue to investigate Let me know if you have any ideas or trouble shooting tips for what could be stopping arp -> k0, dev machines
Thanks
ssh km
container_id="$(crictl ps -a | grep kube-vip | awk '{print $1}')"
crictl logs "${container_id}"
ping -D 192.168.30.222
curl -v -k https://192.168.30.222:6443/cacerts
output
time="2023-10-05T10:06:31Z" level=info msg="Starting kube-vip.io [v0.6.2]"
time="2023-10-05T10:06:31Z" level=info msg="namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[true], Services:[false]"
time="2023-10-05T10:06:31Z" level=info msg="No interface is specified for VIP in config, auto-detecting default Interface"
time="2023-10-05T10:06:31Z" level=info msg="prometheus HTTP server started"
time="2023-10-05T10:06:31Z" level=info msg="kube-vip will bind to interface [eth0]"
time="2023-10-05T10:06:31Z" level=info msg="Starting Kube-vip Manager with the ARP engine"
time="2023-10-05T10:06:31Z" level=info msg="Beginning cluster membership, namespace [kube-system], lock name [plndr-cp-lock], id [km]"
I1005 10:06:31.269258 1 leaderelection.go:245] attempting to acquire leader lease kube-system/plndr-cp-lock...
I1005 10:06:31.342413 1 leaderelection.go:255] successfully acquired lease kube-system/plndr-cp-lock
time="2023-10-05T10:06:31Z" level=info msg="Node [km] is assuming leadership of the cluster"
time="2023-10-05T10:06:31Z" level=info msg="Gratuitous Arp broadcast will repeat every 3 seconds for [192.168.30.222]"
###
PING 192.168.30.222 (192.168.30.222) 56(84) bytes of data.
[1696504936.602201] 64 bytes from 192.168.30.222: icmp_seq=1 ttl=64 time=0.201 ms
### curl output
...
HTTP/2 200
... cert output ... ok
ssh k0
ping -D 192.168.30.222 -w 2
curl -v -k https://192.168.30.222:6443/cacerts --connect-timeout 3
output
PING 192.168.30.222 (192.168.30.222) 56(84) bytes of data.
--- 192.168.30.222 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1029ms
### curl output
* Trying 192.168.30.222:6443...
* Connection timed out after 3001 milliseconds
* Closing connection 0
curl: (28) Connection timed out after 3001 milliseconds
user error:
after some diagnosis I found a symptom of kube vip 's arp broadcast not being registered by the worker node
ssh <worker node ip>
sudo tcpdump -ni eth0 arp
# broadcast of 192.168.30.222 d8:3a:dd:34:c6:62 was being received
# manual set failed
arp -v ether -i eth0 -s 192.168.30.222 d8:3a:dd:34:c6:62
ether: Unknown host
See tldr section in initial comment
Sorry for the confusion..
Maybe add in the README.md or the trouble shooting to make sure metal_lb_ip_range
and apiserver_endpoint
have valid ips in flannel_iface
or assert in the play somewhere?
Hi Tim, thanks for your videos, repos, and docs!!
ssh <master_ip>
I was using apiserver_endpoint: "192.168.30.222" instead of apiserver_endpoint: "192.168.0.222" same problem in metal_lb_ip_range, correct for me was
metal_lb_ip_range: "192.168.0.200-192.168.0.210"
Description
Problem similar ?? to post by @AshDevFr in https://github.com/techno-tim/k3s-ansible/discussions/372_ ??
First I checked this: https://github.com/techno-tim/k3s-ansible/discussions/20 , but I wasn't able to figure out how to fix my problem
On a set of two raspberry pies, (hosts.ini, hardware, and OS below )
Expected Behavior
after ansible-playbook site.yml
Current Behavior
When running the playbook site.yml, it hangs at the
k3s_agent : Enable and check K3s service
task.I tried a bunch of times to
reset
and re-run but without success.I've check all the nodes and I have eth0 everywhere. Also my token is correct
Steps to Reproduce
Context (variables)
Operating system: master: Raspbian GNU/Linux 11 (bullseye) 192.168.0.100 eth0 node: Raspbian GNU/Linux 11 (bullseye) 192.168.0.184 eth0
Hardware: master: Raspberry Pi 4 Model B Rev 1.5; df.Avail=50G; mem=7.6Gi node: Raspberry Pi 2 Model B Rev 1.1; df.Avail=4430932 (4.3G); mem=944104 (921Mi) (I know this is small but master should still start
Variables Used
inventory/pi-cluster/group_vars/all.yml
diff inventory/pi-cluster/group_vars/all.yml inventory/sample/group_vars/all.yml
note I also tried:
Hosts
ansible.cfg
inventory/pi-cluster/hosts.ini
Logs
On the ops host
$ ansible-playbook reset.yml
$ ansible-playbook site.yml
On the master node
sudo journalctl -u k3s > /tmp/k3s.log; grep -E --line-number 'arting Light' /tmp/k3s.log # to find the last start log; tail -n +2277270 /tmp/k3s.log > /tmp/k3s-1642.log
k3s-1642.logOn the worker node (I know the times don't match, I didn't feel like finding the tail of the master logs again, this is the same symptoms as on Oct 03 for the worker)