Closed AkihiroSuda closed 9 months ago
Do you want me to ping some rocky devs? I think they might be able to provide insight.
Probably not yet, until VXLAN works for me on Rocky
Probably not yet, until VXLAN works for me on Rocky
Okay I won't! But if they could be of help here (getting it working) let me know and I can.
WIP: this seems to somehow enable VXLAN functional
(sysctl values are from https://qiita.com/tom7/items/1bc7f4e568b20c306845)
# Execute inside `nsenter -t $(pgrep dockerd) -n -U` before running `make up`
# VRF
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.ipv4.tcp_l3mdev_accept=1
sysctl -w net.ipv4.udp_l3mdev_accept=1
sysctl -w net.ipv4.conf.default.rp_filter=0
sysctl -w net.ipv4.conf.all.rp_filter=0
# Inspiered by Cumulus
sysctl -w net.ipv4.conf.default.arp_accept=0
sysctl -w net.ipv4.conf.default.arp_announce=2
sysctl -w net.ipv4.conf.default.arp_filter=0
sysctl -w net.ipv4.conf.default.arp_ignore=1
sysctl -w net.ipv4.conf.default.arp_notify=1
Woot! So just to clarify - if I run this on the host nodes (not in containers) right before make up, this should work?
I can try this tonight (after you confirm the above!) It would be so great to get this working on rocky because our networking is good there, but we haven't figured out ubuntu yet.
It turns out that net.ipv4.conf.default.rp_filter
is set to 1 (strict) on Rocky 9.
This has to be 0 (disabled) or 2 (loose) in the rootless dockerd's network namespace. (Setting this value for the node container isn't enough).
This value may still remain 1 on the host.
Now this is ready for testing.
Excellent! So should I test this branch as it is now, no changes to my rocky base images, or do we need further changes?
Excellent! So should I test this branch as it is now, no changes to my rocky base images, or do we need further changes?
No further change is expected to be needed
Awesome! My rocky image is building now and I should be able to bring up a testing cluster after dinner. Will send you an update when I do! 🎉
Confirmed that this works on AlmaLinux 9.2 too, of course
hey @AkihiroSuda! Congrats on your award today, you and your contributions are amazing and we so appreciate you!
I was running into some issues (related to this one, but on ubuntu) and wanted to post what I learned for some future person. First, I was still getting a dbus error with the make up command:
cat: /sys/fs/cgroup/user.slice/user-501043911.slice/user@501043911.service/cgroup.controllers: No such file or directory
Failed to connect to bus: No such file or directory
[INFO] systemd not detected, dockerd-rootless.sh needs to be started manually:
And the fix was to rebuild my base image, and I added apt-get upgrade
to update the kernel (that worked!) Then I was getting an error about net.ipv4.conf.default.rp_filter, specifically that it was still 1. But the rootless init script did create this to set to 2:
$ cat /etc/sysctl.d/99-usernetes.conf
net.ipv4.conf.default.rp_filter = 2
I has already run this ./init-host/init-host.rootless.sh
and here was the full error:
[INFO] Detected container engine type: docker
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
[ERROR] sysctl value "net.ipv4.conf.default.rp_filter" must be 0 (disabled) or 2 (loose) in the container engine's network namespace
make: *** [Makefile:60: check-preflight] Error 1
(sidenote) no matter how many times I run this, I always see this warning and I haven't figured out why that's the case yet:
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
But I determined that I think it's still set to 1 on my host?
$ grep [01] /proc/sys/net/ipv4/conf/*/rp_filter|egrep "default|all"
/proc/sys/net/ipv4/conf/all/rp_filter:1
So I did:
$ sudo vim /etc/sysctl.conf
vsochat_gmail_com@usernetes-compute-001:/opt/usernetes$ sudo sysctl -p
net.ipv4.conf.default.rp_filter = 2
(changing it to 2) and restarted docker:
systemctl --user restart docker.service
And then the make up
worked! But I wonder why that wasn't fixed to start? Now I have a control plane!
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-7wstg 1/1 Running 0 23m
kube-system coredns-5dd5756b68-ccwtd 1/1 Running 0 23m
kube-system coredns-5dd5756b68-m7c7v 1/1 Running 0 23m
kube-system etcd-u7s-usernetes-compute-001 1/1 Running 0 23m
kube-system kube-apiserver-u7s-usernetes-compute-001 1/1 Running 0 23m
kube-system kube-controller-manager-u7s-usernetes-compute-001 1/1 Running 0 23m
kube-system kube-proxy-gzxg8 1/1 Running 0 23m
kube-system kube-scheduler-u7s-usernetes-compute-001 1/1 Running 0 23m
For the worker node, my power went out and I didn't get to test it fully, but when I ran the script to bring up the worker it seemed to hang:
./Makefile.d/check-preflight.sh
[INFO] Detected container engine type: docker
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
docker compose up --build -d
[+] Building 0.2s (9/9) FINISHED docker:default
=> [node internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 809B 0.0s
=> [node internal] load .dockerignore 0.0s
=> => transferring context: 75B 0.0s
=> [node internal] load metadata for docker.io/kindest/node:v1.28.0 0.2s
=> [node 1/4] FROM docker.io/kindest/node:v1.28.0@sha256:b7a4cad12c197af 0.0s
=> [node internal] load build context 0.0s
=> => transferring context: 84B 0.0s
=> CACHED [node 2/4] RUN arch="$(uname -m | sed -e s/x86_64/amd64/ -e s/ 0.0s
=> CACHED [node 3/4] RUN apt-get update && apt-get install -y --no-insta 0.0s
=> CACHED [node 4/4] ADD Dockerfile.d/u7s-entrypoint.sh / 0.0s
=> [node] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:ef1a52ff46bc2c33546f1db882bb04667aecb3e532c5b 0.0s
=> => naming to docker.io/library/usernetes-node 0.0s
[+] Running 1/0
✔ Container usernetes-node-1 Running 0.0s
docker compose exec -e U7S_HOST_IP=10.10.0.3 -e U7S_NODE_NAME=u7s-usernetes-compute-003 -e U7S_NODE_SUBNET=10.100.5.0/24 node sh -euc '$(cat /usernetes/join-command)'
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
I think the above was running make -C /opt/usernetes up kubeadm-join
with the copied over join-command
.
But I didn't see the node with kubectl get nodes
. What should I try to debug next? I had to bring my cluster down from my phone when my power went off in case it was an all day thing and I was burning cloud monies. :laughing:
Congrats on your award today, you and your contributions are amazing and we so appreciate you!
Thank you
But I wonder why that wasn't fixed to start?
Because the sysctl value of the dockerd process is propagated to the container.
But I didn't see the node with kubectl get nodes. What should I try to debug next?
Any error from kubeadm-join?
I had to bring my cluster down from my phone when my power went off in case it was an all day thing and I was burning cloud monies. 😆
I'd suggest to use local VMs for an exercise
e.g., with https://lima-vm.io/ :
limactl start --network=lima:user-v2 --name=vm0 template://rockylinux-9
limactl start --network=lima:user-v2 --name=vm1 template://rockylinux-9
oh neat - I am not familiar with this tool. I'll try this out after a meeting / later this evening and give you an update!
okay so I created two rocky VMs - but I don't really know how to get them networked or even the basics. I do see there are templates:
Okay I installed lima and QEMU and created two rocky VMs - and I don't know enough basics to even get a ping working from one VM to the other. I do see there are templates:
And namely some for k8s and k3s - is there any reason there isn't a template for usernetes? is it that a template == one vm? It seems like if one person has stepped through this process of using lima (and knows how to do it) it would be logical to provide a template for a control plan and then N workers for someone else to easily deploy.
Any error from kubeadm-join?
Will bring up a cluster now and look into this! I've been working for months on these terraform (now OpenTofu) templates and it feels daunting to start from scratch with a VM tool I've never used before. I'm hoping I'm close with the tofu configs on GCP to have something working more quickly.
okay here is the error from kubeadm-join:
docker compose exec -e U7S_HOST_IP=10.10.0.5 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node sh -euc '$(cat /usernetes/join-command)'
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-11-09T04:50:45Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/containerd/containerd.sock: connect: no such file or directory\""
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
make: *** [Makefile:112: kubeadm-join] Error 1
If I shell in (or just run again from the outside) it hangs here:
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
For the control plane (that appears to work) what I see in make logs
:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 1013: write /proc/1013/oom_score_adj: permission denied"
Nov 09 04:56:43 u7s-usernetes-compute-001 kubelet[1013]: E1109 04:56:43.260899 1013 container_manager_linux.go:509] "Failed to ensure process in container with oom score" err="failed to apply oom score -999 to PID 1013: write /proc/1013/oom_score_adj: permission denied"
And the worker node (hanging) I see:
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.432440353Z" level=warning msg="The image docker.io/kindest/local-path-helper:v20230510-486859a6 is not unpacked."
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: Finished Update UTMP about System Runlevel Changes.
Nov 09 04:50:45 u7s-usernetes-compute-002 systemd[1]: Startup finished in 199ms.
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444757902Z" level=info msg="Start event monitor"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444784147Z" level=info msg="Start snapshots syncer"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444793703Z" level=info msg="Start cni network conf syncer for default"
Nov 09 04:50:45 u7s-usernetes-compute-002 containerd[181]: time="2023-11-09T04:50:45.444799589Z" level=info msg="Start streaming server"
But I don't see the node is registered:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
u7s-usernetes-compute-001 Ready control-plane 11m v1.28.0
This did work once for me, when it was in the middle of development! I wish I knew what changed :/ I could try going back to rocky since that works now, but I had thought ubuntu was a more sound option.
The hanging terminal finally timed out:
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.10.0.3:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
To see the stack trace of this error execute with --v=5 or higher
root@u7s-usernetes-compute-002:/usernetes#
Do you want me to ping some rocky devs? I think they might be able to provide insight.