Closed AkihiroSuda closed 1 year ago
gcloud compute networks create mtu9k --mtu=8896
okay found the setting in terraform - testing now.
ack, it's timing out again on:
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
This happens maybe 2/3 times, so something is up!
yeah, not getting through either of these steps now with this change. :/ I wonder if this is still issues with Google networking. I think my next step needs to be to create a terraform setup for aws. I have a lot on my Q with 2 talks but I'll find time somewhere!
We are still debugging the ubuntu setup - what appears to be happening is that we don't have basic networking (e.g., even with a configuration that works on rocky, on ubuntu I can open a little webserver on some port, and the curl -k <address>
has no route to host. I've started debugging - trying to remove docker entirely and NFS, and still no go. I'm not super great with networking but I'll keep reading and trying to understand why it's not working. I'm especially puzzled because it was working before, I think before a change here, but I don't remember the details. Will keep you updated for sure!
FYI I'm trying to support Rocky, but VXLAN doesn't seem to work even with local Lima VMs:
net.ipv4.conf.default.rp_filter
seems set to 1
(strict) on GCP's Ubuntu image, that might be the reason of the issue on GCP.
Oh! I can test this too. Is it possible to change it, and if so, how?
Confirmed that VXLAN is functional on GCP with https://github.com/rootless-containers/usernetes/commit/462ccf05dd4931d664ff7cbb3325123a29246dee š
Is it possible to change it, and if so, how?
(Also you have to run systemctl --user restart docker.service
)
I'm not sure it's sticking - I see:
$ make up
./Makefile.d/check-preflight.sh
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
[ERROR] sysctl value "net.ipv4.conf.default.rp_filter" must be 0 (disabled) or 2 (loose) in the daemon's network namespace
make: *** [Makefile:57: check-preflight] Error 1
And in the output of sysctl --system
I see it at the end:
* Applying /etc/sysctl.d/99-usernetes.conf ...
net.ipv4.conf.default.rp_filter = 2
* Applying /etc/sysctl.conf ...
But I still get that message. I checked the file reported to run after, but it's commented out (so I suspect should not have influence).
$ cat /etc/sysctl.conf |grep ipv4
#net.ipv4.conf.default.rp_filter=1
#net.ipv4.conf.all.rp_filter=1
#net.ipv4.tcp_syncookies=1
#net.ipv4.ip_forward=1
#net.ipv4.conf.all.accept_redirects = 0
# net.ipv4.conf.all.secure_redirects = 1
#net.ipv4.conf.all.send_redirects = 0
#net.ipv4.conf.all.accept_source_route = 0
#net.ipv4.conf.all.log_martians = 1
Am I missing a detail? I ran the commands from the README on my own, ran into this bug, and then ran the init scripts you prepared no luck.
Ah this is interesting!
$ sysctl -n net.ipv4.conf.default.rp_filter
2
$ docker run --rm --net=host busybox sysctl -n net.ipv4.conf.default.rp_filter
1
Doh, this fixed it, I think I put it in the wrong spot in my script!
systemctl --user restart docker.service
Trying again!
okay (for the ubuntu setup) it's still hanging here:
ā Container usernetes-node-1 Running 0.0s
docker compose exec -e U7S_HOST_IP=10.10.0.2 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.4:6443 --token t8ub7m.rfjcdt2jdh24miia --discovery-token-ca-cert-hash sha256:8c3067d686064b134b6f0a604623f13e73fa46e6aa3c0ee44bd9b57b8147213c
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
For that .net value on the worker node, it's also 2/2 (good). I think the issue on ubuntu is still not fixed w.r.t networking, e.g., running python3 -m http.server 9999
and the firewall has all tcp ports open, from another instance:
$ curl -k 10.10.0.2:9999
curl: (7) Failed to connect to 10.10.0.2 port 9999 after 0 ms: No route to host
Going to try rocky instead.
okay will need to figure out how to install rootless docker on rocky - the default script says unsupported distribution. When I download the script add rocky to the list:
$ ./install-docker.sh
# Executing docker install script, commit: e5543d473431b782227f8908005543bb4389b8de
+ sudo -E sh -c 'yum install -y -q yum-utils'
Installed:
yum-utils-4.0.21-19.el8_8.noarch
+ sudo -E sh -c 'yum-config-manager --add-repo https://download.docker.com/linux/rocky/docker-ce.repo'
Adding repo from: https://download.docker.com/linux/rocky/docker-ce.repo
Status code: 404 for https://download.docker.com/linux/rocky/docker-ce.repo (IP: 99.84.160.77)
Error: Configuration of repo failed
how to install rootless docker on rocky
That worked! Next issue is that this is missing (I'm going through the other make steps now).
[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
It doesn't hand at the hugetlb though, which means the networking is working and that's great! I can confirm that too with starting up a little web server and doing curl -k
to hit it.
/proc/sys/net/bridge/bridge-nf-call-iptables does not exist
You need to modprobe br_netfilter
https://github.com/rootless-containers/usernetes/blob/4f81b6e34d331e27ef0b427ed4a7cb819b8d42cb/init-host/init-host.root.sh#L18-L22
This sequence:
sudo modprobe ip_tables
sudo modprobe br_netfilter
sudo modprobe vxlan
sudo systemctl restart systemd-modules-load.service
# Run init host scripts (I'm not sure if we should skip the first or clone in image build and run there?)
sudo ./init-host/init-host.root.sh
./init-host/init-host.rootless.sh
Always ends telling me a warning that it's disabled:
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
On the host:
$ sudo sysctl -a | grep iptables
net.bridge.bridge-nf-call-iptables = 1
But I don't see anything in the container:
docker run --rm --net=host busybox sysctl -a | grep iptables
And I did try:
systemctl --user restart docker.service
But the above is still empty.
You may need modprobe bridge
too?
okay tried that - no change.
VXLAN doesn't seem to work on GCP, while it works on AWS and Azure
Likely to be related to MTU.
Version: Usernetes gen2-v20230906.0, Rootless Docker 24.0.6, on Ubuntu 22.04.