rootless-containers / usernetes

Kubernetes without the root privileges
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless
Apache License 2.0
865 stars 58 forks source link

VXLAN doesn't seem to work on GCP (while works on AWS and Azure); probably related to MTU #300

Closed AkihiroSuda closed 1 year ago

AkihiroSuda commented 1 year ago

VXLAN doesn't seem to work on GCP, while it works on AWS and Azure

$ kubectl taint nodes --all node-role.kubernetes.io/control-plane-
$ ./hack/test-smoke.sh 
[INFO] Waiting for nodes to be ready
node/u7s-suda-tmp-1 condition met
node/u7s-suda-tmp-2 condition met
[INFO] Creating StatefulSet "dnstest" and headless Service "dnstest"
service/dnstest created
statefulset.apps/dnstest created
[INFO] Waiting for 3 replicas to be ready
Waiting for 3 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 2 pods to be ready...
Waiting for 1 pods to be ready...
Waiting for 1 pods to be ready...
partitioned roll out complete: 3 new pods have been updated...
[INFO] Connecting to dnstest-{0,1,2}.dnstest.default.svc.cluster.local
If you don't see a command prompt, try pressing enter.
wget: bad address 'dnstest-0.dnstest.default.svc.cluster.local'
pod "dnstest-shell" deleted
pod default/dnstest-shell terminated (Error)

Likely to be related to MTU.

Version: Usernetes gen2-v20230906.0, Rootless Docker 24.0.6, on Ubuntu 22.04.

aojea commented 1 year ago

gcloud compute networks create mtu9k --mtu=8896

https://cloud.google.com/vpc/docs/mtu

vsoch commented 1 year ago

okay found the setting in terraform - testing now. image

vsoch commented 1 year ago

ack, it's timing out again on:

[preflight] Running pre-flight checks
    [WARNING SystemVerification]: missing optional cgroups: hugetlb

This happens maybe 2/3 times, so something is up!

vsoch commented 1 year ago

yeah, not getting through either of these steps now with this change. :/ I wonder if this is still issues with Google networking. I think my next step needs to be to create a terraform setup for aws. I have a lot on my Q with 2 talks but I'll find time somewhere!

vsoch commented 1 year ago

We are still debugging the ubuntu setup - what appears to be happening is that we don't have basic networking (e.g., even with a configuration that works on rocky, on ubuntu I can open a little webserver on some port, and the curl -k <address> has no route to host. I've started debugging - trying to remove docker entirely and NFS, and still no go. I'm not super great with networking but I'll keep reading and trying to understand why it's not working. I'm especially puzzled because it was working before, I think before a change here, but I don't remember the details. Will keep you updated for sure!

AkihiroSuda commented 1 year ago

FYI I'm trying to support Rocky, but VXLAN doesn't seem to work even with local Lima VMs:

AkihiroSuda commented 1 year ago

net.ipv4.conf.default.rp_filter seems set to 1 (strict) on GCP's Ubuntu image, that might be the reason of the issue on GCP.

vsoch commented 1 year ago

Oh! I can test this too. Is it possible to change it, and if so, how?

AkihiroSuda commented 1 year ago

Confirmed that VXLAN is functional on GCP with https://github.com/rootless-containers/usernetes/commit/462ccf05dd4931d664ff7cbb3325123a29246dee šŸŽ‰

Is it possible to change it, and if so, how?

https://github.com/rootless-containers/usernetes/blob/462ccf05dd4931d664ff7cbb3325123a29246dee/hack/init-host.root.sh#L24-L30

(Also you have to run systemctl --user restart docker.service )

vsoch commented 1 year ago

I'm not sure it's sticking - I see:

$ make up
./Makefile.d/check-preflight.sh
[WARNING] systemd lingering is not enabled. Run `sudo loginctl enable-linger $(whoami)` to enable it, otherwise Kubernetes will exit on logging out.
[WARNING] Kernel module "ip6_tables" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "ip6table_nat" does not seem loaded? (negligible if built-in to the kernel)
[WARNING] Kernel module "iptable_nat" does not seem loaded? (negligible if built-in to the kernel)
[ERROR] sysctl value "net.ipv4.conf.default.rp_filter" must be 0 (disabled) or 2 (loose) in the daemon's network namespace
make: *** [Makefile:57: check-preflight] Error 1

And in the output of sysctl --system I see it at the end:

* Applying /etc/sysctl.d/99-usernetes.conf ...
net.ipv4.conf.default.rp_filter = 2
* Applying /etc/sysctl.conf ...

But I still get that message. I checked the file reported to run after, but it's commented out (so I suspect should not have influence).

$ cat /etc/sysctl.conf |grep ipv4
#net.ipv4.conf.default.rp_filter=1
#net.ipv4.conf.all.rp_filter=1
#net.ipv4.tcp_syncookies=1
#net.ipv4.ip_forward=1
#net.ipv4.conf.all.accept_redirects = 0
# net.ipv4.conf.all.secure_redirects = 1
#net.ipv4.conf.all.send_redirects = 0
#net.ipv4.conf.all.accept_source_route = 0
#net.ipv4.conf.all.log_martians = 1

Am I missing a detail? I ran the commands from the README on my own, ran into this bug, and then ran the init scripts you prepared no luck.

vsoch commented 1 year ago

Ah this is interesting!

$ sysctl -n net.ipv4.conf.default.rp_filter
2
$ docker run --rm --net=host busybox sysctl -n net.ipv4.conf.default.rp_filter
1
vsoch commented 1 year ago

Doh, this fixed it, I think I put it in the wrong spot in my script!

systemctl --user restart docker.service

Trying again!

vsoch commented 1 year ago

okay (for the ubuntu setup) it's still hanging here:

 āœ” Container usernetes-node-1  Running                                                                                        0.0s 
docker compose exec -e U7S_HOST_IP=10.10.0.2 -e U7S_NODE_NAME=u7s-usernetes-compute-002 -e U7S_NODE_SUBNET=10.100.153.0/24 node kubeadm join 10.10.0.4:6443 --token t8ub7m.rfjcdt2jdh24miia --discovery-token-ca-cert-hash sha256:8c3067d686064b134b6f0a604623f13e73fa46e6aa3c0ee44bd9b57b8147213c 
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: missing optional cgroups: hugetlb

For that .net value on the worker node, it's also 2/2 (good). I think the issue on ubuntu is still not fixed w.r.t networking, e.g., running python3 -m http.server 9999 and the firewall has all tcp ports open, from another instance:

$ curl -k 10.10.0.2:9999
curl: (7) Failed to connect to 10.10.0.2 port 9999 after 0 ms: No route to host

Going to try rocky instead.

vsoch commented 1 year ago

okay will need to figure out how to install rootless docker on rocky - the default script says unsupported distribution. When I download the script add rocky to the list:

$ ./install-docker.sh 
# Executing docker install script, commit: e5543d473431b782227f8908005543bb4389b8de
+ sudo -E sh -c 'yum install -y -q yum-utils'

Installed:
  yum-utils-4.0.21-19.el8_8.noarch                                              

+ sudo -E sh -c 'yum-config-manager --add-repo https://download.docker.com/linux/rocky/docker-ce.repo'
Adding repo from: https://download.docker.com/linux/rocky/docker-ce.repo
Status code: 404 for https://download.docker.com/linux/rocky/docker-ce.repo (IP: 99.84.160.77)
Error: Configuration of repo failed
AkihiroSuda commented 1 year ago

how to install rootless docker on rocky

https://github.com/rootless-containers/usernetes/blob/4f81b6e34d331e27ef0b427ed4a7cb819b8d42cb/init-host/init-host.root.sh#L32-L40

vsoch commented 1 year ago

That worked! Next issue is that this is missing (I'm going through the other make steps now).

[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist

It doesn't hand at the hugetlb though, which means the networking is working and that's great! I can confirm that too with starting up a little web server and doing curl -k to hit it.

AkihiroSuda commented 1 year ago

/proc/sys/net/bridge/bridge-nf-call-iptables does not exist

You need to modprobe br_netfilter https://github.com/rootless-containers/usernetes/blob/4f81b6e34d331e27ef0b427ed4a7cb819b8d42cb/init-host/init-host.root.sh#L18-L22

vsoch commented 1 year ago

This sequence:

sudo modprobe ip_tables
sudo modprobe br_netfilter 
sudo modprobe vxlan 
sudo systemctl restart systemd-modules-load.service 

# Run init host scripts (I'm not sure if we should skip the first or clone in image build and run there?)
sudo ./init-host/init-host.root.sh 
./init-host/init-host.rootless.sh

Always ends telling me a warning that it's disabled:

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
vsoch commented 1 year ago

On the host:

$ sudo sysctl -a | grep iptables
net.bridge.bridge-nf-call-iptables = 1

But I don't see anything in the container:

docker run --rm --net=host busybox sysctl -a | grep iptables

And I did try:

systemctl --user restart docker.service

But the above is still empty.

AkihiroSuda commented 1 year ago

You may need modprobe bridge too?

vsoch commented 1 year ago

okay tried that - no change.