Closed FraGThieF closed 2 years ago
Did you get latest? Also are you sure you are using the right interface name in your variables? Can you show them here?
THX for your tutorials and help.
Yes, i got the latest,
Interface is eth0
k3s_version: v1.23.4+k3s1 ansible_user: FraG
systemd_dir: /etc/systemd/system
flannel_iface: "eth0"
apiserver_endpoint: "192.168.0.190"
k3s_token: "++++++++++"
extra_server_args: "--no-deploy servicelb --no-deploy traefik --write-kubeconfig-mode 644 default-not-ready-toleration-seconds=30 --kube-apiserver-arg default-unreachable-toleration-seconds=30 --kube-controller-arg node-monitor-period=20s --kube-controller-arg node-monitor-grace-period=20s --kubelet-arg node-status-update-frequency=5s" extra_agent_args: "--kubelet-arg node-status-update-frequency=5s"
kube_vip_tag_version: "v0.4.2"
metal_lb_speaker_tag_version: "v0.12.1" metal_lb_controller_tag_version: "v0.12.1"
metal_lb_ip_range: "192.168.0.180-192.168.0.189"
I don't see anything odd. I would try removing all server args except required, reset, and try it again
Expand your hard disks... On all nodes...
Probably should make a note Tim. You do say this in the video! Thx for your work!
I too ran into this problem. I have double checked that the hosts.ini
file indeed does match up to the IP addresses which are running. I am running 2 physical servers (both x86) with 3 back-planes and 2 workers.
Before doing this I actually switched my Ubuntu template to the prior video that Tim did and made sure that the both username/password and SSH keys are consistent across all VMs.
I am encountering the same issue as well when I try to run this on rasperry pi's. With Raspberry Pi OS Lite 64 bit I can't get it to work at all. But with Ubuntu I can get 1 master up and running with nodes. But when I try with 2 or 3 masters I get stuck with the same issue.
Do all machines have the same time zones, same ssh keys, and are able to communicate with each other? Are you using passwordless sudo? If if not you might have to pass in additional flags like --user user --ask-become
I would also remove any additional args from your nodes. These can be problematic on slow machines.
I am encountering the same issue on a setup provisioned with vagrant. Here is the stacktrace in verbose mode
FAILED - RETRYING: [172.20.20.11]: Verify that all nodes actually joined (check k3s-init.service if this fails) (20 retries left).Result was: { "attempts": 1, "changed": false, "cmd": [ "k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}" ], "delta": "0:00:02.058195", "end": "2022-04-03 19:31:08.887394", "invocation": { "module_args": { "_raw_params": "k3s kubectl get nodes -l \"node-role.kubernetes.io/master=true\" -o=jsonpath=\"{.items[*].metadata.name}\"", "_uses_shell": false, "argv": null, "chdir": null, "creates": null, "executable": null, "removes": null, "stdin": null, "stdin_add_newline": true, "strip_empty_ends": true, "warn": false } }, "msg": "non-zero return code", "rc": 1, "retries": 21, "start": "2022-04-03 19:31:06.829199", "stderr": "time=\"2022-04-03T19:31:06Z\" level=info msg=\"Acquiring lock file /var/lib/rancher/k3s/data/.lock\"\nThe connection to the server localhost:8080 was refused - did you specify the right host or port?", "stderr_lines": [ "time=\"2022-04-03T19:31:06Z\" level=info msg=\"Acquiring lock file /var/lib/rancher/k3s/data/.lock\"", "The connection to the server localhost:8080 was refused - did you specify the right host or port?" ], "stdout": "", "stdout_lines": [] }
Expand your hard disks... On all nodes...
Probably should make a note Tim. You do say this in the video! Thx for your work!
How big should the hard disk be? ATM it is 86% free?`
Do all machines have the same time zones, same ssh keys, and are able to communicate with each other? Are you using passwordless sudo? If if not you might have to pass in additional flags like
--user user --ask-become
I would also remove any additional args from your nodes. These can be problematic on slow machines.
Yes, yes, yes, yes.
I have tried it with only the first do neccessary args and run into the same issue again.
can you please paste your all.yaml
along with OS you are running these on? it's hard to tell what you are using vs what you copied and pasted from docs.
Ok, so I was able to solve my issue. So to be able to post my full all.yaml I changed my k3s token from "K10908c1e28096f6665800342ac1bd8962df701dca5519d73b878a89d1921b432b4" to "mynotsosecrettoken" and this fixed my issue. So now I can run 2 masters and 2 workers that are running on Ubuntu on raspberry pi.
I have also done a reset and verified that the old token was causing the issue. And by doing another reset and changing it to the new token I was able to successfully run 2 masters and 2 workers again.
After some digging in logs I found this error line. So I was a bit unlucky with the token I had set:
Apr 04 03:04:03 master01 k3s[5136]: time="2022-04-04T03:04:03+02:00" level=fatal msg="failed to normalize token; must be in format K10<CA-HASH>::<USERNAME>:<PASSWORD> or <PASSWORD>"
k3s_version: v1.23.4+k3s1
# this is the user that has ssh access to these machines
ansible_user: pi
systemd_dir: /etc/systemd/system
# interface which will be used for flannel
flannel_iface: "eth0"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.90.30"
# k3s_token is required masters can talk together securely
# this token should be alpha numeric only
k3s_token: "mynotsosecrettoken"
# change these to your liking, the only required one is--no-deploy servicelb
extra_server_args: "--no-deploy servicelb --no-deploy traefik --write-kubeconfig-mode 644"
extra_agent_args: ""
# image tag for kube-vip
kube_vip_tag_version: "v0.4.3"
# image tag for metal lb
metal_lb_speaker_tag_version: "v0.12.1"
metal_lb_controller_tag_version: "v0.12.1"
# metallb ip range for load balancer
metal_lb_ip_range: "192.168.90.80-192.168.90.81"
I had some odd disk storage issues on one of my physical devices but coming out of that I'm still experiencing the problem. I do have some initial success with the first master server:
and then I get this message which I don't quite know how to read:
All three masters are now engaging and there is a "change" but it's writing text out to stderr seems like potentially a bad sign. This is then followed by the slow beat of an unhappy service:
Diagnostics at this point are:
Final output is:
My all.yml
file is:
---
k3s_version: v1.23.5-rc5+k3s1
# this is the user that has ssh access to these machines
ansible_user: ken
become: true
systemd_dir: /etc/systemd/system
# interface which will be used for flannel
flannel_iface: "eth0"
# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.100.200"
# k3s_token is required masters can talk together securely
# this token should be alpha numeric only
k3s_token: "xxxxxxxxxxxxxxxx"
# change these to your liking, the only required one is--no-deploy servicelb
extra_server_args: "--no-deploy servicelb --no-deploy traefik --write-kubeconfig-mode 644 --kube-api-server-arg default-not-ready-toleration-seconds=30 --kube-apiserver-arg default-unreachable-toleration-seconds=30 --kube-controller-arg node-monitor-period=20s --kube-controller-arg node-monitor-grace-period=20s --kubelet-arg node-status-update-frequency=5s"
extra_agent_args: "--kubelet-arg node-status-update-frequency=5s"
# image tag for kube-vip
kube_vip_tag_version: "v0.4.3"
# image tag for metal lb
metal_lb_speaker_tag_version: "v0.12.1"
metal_lb_controller_tag_version: "v0.12.1"
# metallb ip range for load balancer
metal_lb_ip_range: "192.168.100.80-192.168.100.89"
Oh, i also traversed all nodes -- master and worker -- and ran k3s check-status
and all came back with a "pass" status.
Not surprisingly trying to check on the nodes in the cluster failed as the service is not running:
I can -- however -- manually start the masters with sudo k3s server &
and they do stay up as this 404 error shows:
In the long trail of logs that come from starting these servers the only errors I'm seeing are the following:
could this be indicative of a missing network dep to support websockets?
Finally, note that in this current state, I can reach the active node directly at 192.168.100.204
but the load balancing VIP seems to be up too as I get the same results hitting it on 192.168.100.200
Now I find this very odd ... even though I set the configuration as you did in your video ... I am not able to run the kubectl commands without sudo:
Even more concerning, each of the masters is aware of only itself rather than the cluster at large.
Regarding my issue running on Raspberry Pi OS Lite 64 bit. I had to manually add " cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory" to the cmdline.txt file. So it looks like ansible is not adding it as expected.
@yankeeinlondon please try without all of the args and be sure you have enough disk space on these nodes
Also, this is turning more into a discussion rather than to report bugs :)
Hello! First of all thank you for this guide! Learn so many new things, but now i am stucked and do not know how to proceed.
I try to get it up and running on rasperry pi's, 3 Masters, and 4 Workers.
Error message after lunching the playbook:
What can I do? Where do I have to search? what could the error be?
Thanks for any help that you can give me.