rancher / rke2

https://docs.rke2.io/
Apache License 2.0
1.57k stars 268 forks source link

failed to get CA certs: Get "https://127.0.0.1:6444/cacerts\" #5651

Closed kurborg closed 7 months ago

kurborg commented 7 months ago

Environmental Info: RKE2 Version: v1.26.15+rke2r11)

Node(s) CPU architecture, OS, and Version: Rocky 8.6

Cluster Configuration: 1 server, 3 agents. Airgapped network. Do I need to setup certs if on an airgapped installation?

Describe the bug: Trying to run rke2 on an airgapped webserver container 4 nodes. The rke2-server.service is running fine, but when trying to run rke2-agent.service on the other nodes and using the node-token from the primary node into the /etc/rancher/rke2/config.yaml but I'm getting an error for

failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\

I'm able to curl -vks https://SERVERIP:9345/ping with the expected output. rke2-server is running fine and showing when I run kubectl get nodes It's running on separate servers/nodes not on the same server

kurborg commented 7 months ago
● rke2-agent.service - Rancher Kubernetes Engine v2 (agent)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2024-03-28 13:39:14 EDT; 2min 30s ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 861436 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 861434 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 861431 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 861437 (rke2)
    Tasks: 86
   Memory: 4.5G
   CGroup: /system.slice/rke2-agent.service
           ├─861437 /usr/local/bin/rke2 agent
           ├─861460 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─861513 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox2 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=203.0.113.254 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/var/lib/rancher/rke2/agent/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
           └─861626 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id e796b7d2594362be074497cc3c5366a6ebaa4594e019778241eeca3427275f1a -address /run/k3s/containerd/containerd.sock

Mar 28 13:41:34 silverbox2 rke2[861437]: time="2024-03-28T13:41:34-04:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 203.0.113.254:9345: connect: connection refused"
Mar 28 13:41:34 silverbox2 rke2[861437]: time="2024-03-28T13:41:34-04:00" level=error msg="Remotedialer proxy error" error="dial tcp 203.0.113.254:9345: connect: connection refused"
Mar 28 13:41:39 silverbox2 rke2[861437]: time="2024-03-28T13:41:39-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": EOF"
Mar 28 13:41:39 silverbox2 rke2[861437]: time="2024-03-28T13:41:39-04:00" level=info msg="Connecting to proxy" url="wss://203.0.113.254:9345/v1-rke2/connect"
Mar 28 13:41:39 silverbox2 rke2[861437]: time="2024-03-28T13:41:39-04:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 203.0.113.254:9345: connect: connection refused"
Mar 28 13:41:39 silverbox2 rke2[861437]: time="2024-03-28T13:41:39-04:00" level=error msg="Remotedialer proxy error" error="dial tcp 203.0.113.254:9345: connect: connection refused"
Mar 28 13:41:44 silverbox2 rke2[861437]: time="2024-03-28T13:41:44-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: failed to get CA certs: Get \"https://127.0.0.1:6444/cacerts\": read tcp 127.0.0.1:41722->127.0.0.1:6444: read: connection reset by peer"
Mar 28 13:41:44 silverbox2 rke2[861437]: time="2024-03-28T13:41:44-04:00" level=info msg="Connecting to proxy" url="wss://203.0.113.254:9345/v1-rke2/connect"
Mar 28 13:41:44 silverbox2 rke2[861437]: time="2024-03-28T13:41:44-04:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 203.0.113.254:9345: connect: connection refused"
Mar 28 13:41:44 silverbox2 rke2[861437]: time="2024-03-28T13:41:44-04:00" level=error msg="Remotedialer proxy error" error="dial tcp 203.0.113.254:9345: connect: connection refused"

Per the instructions I set up a dummy route doing

sudo ip link add dummy0 type dummy sudo ip link set dummy0 up sudo ip addr add 203.0.113.254/31 dev dummy0 sudo ip route add default via 203.0.113.255 dev dummy0 metric 1000

But without this I get an error for ip route without this apiserver exited: unable to find suitable network address.error='no default routes found in "/proc/net/route" or "/proc/net/ipv6_route"

brandond commented 7 months ago

level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 203.0.113.254:9345: connect: connection refused"

Your agent can't connect to the server. Are you sure the server is running, and the correct ports are open?

kurborg commented 7 months ago

Ahh the infamous brandond quick to answer, thank you!

I have disabled firewalld and when I run netstat -tlpn I see rke2 server is listening on ::9345

My server is running on 192.168.1.110 on the LAN

Yes I'm sure the rke2-server.service is running. I changed the mode to debug and I added insecure-skip-tls-verify: true to /etc/rancher/rke2/config.yaml before running systemctl enable --now rke2-server.service

I'm getting the following output:

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2024-03-28 16:01:39 EDT; 17min ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 3968734 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 3968732 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 3968729 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 3968737 (rke2)
    Tasks: 277
   Memory: 7.4G
   CGroup: /system.slice/rke2-server.service
           ├─3968737 /usr/local/bin/rke2 server
           ├─3968760 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─3969175 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox1 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=203.0.113.254 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
           ├─3969272 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id adce39022def66759fff9a1cbc7bac98ccf3ca49aa4da57e335f7695ba663284 -address /run/k3s/containerd/containerd.sock
           ├─3969372 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 063046cb319a554dfe131b24964374b271de07af3f52eca103d690bbab0e7cdf -address /run/k3s/containerd/containerd.sock
           ├─3969527 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id c82b4b7265a8240e7f7d7bcabfbc6627f1305c015c05e7fe8abd236d5b828c44 -address /run/k3s/containerd/containerd.sock
           ├─3969528 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8cd025fdcc33d93f73e5a16155692addacbf4da46a89ef01058749392c924b0a -address /run/k3s/containerd/containerd.sock
           ├─3969715 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a7fd55b9b05fb53893a0c76565651bc2fbe7cb7dcb64643aaf7f9862fdb1db1 -address /run/k3s/containerd/containerd.sock
           ├─3969937 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id e63c3fc79a74052b5eb3e9cc461f2049e68b4b177b5040c0fd8a4f13187422e0 -address /run/k3s/containerd/containerd.sock
           ├─3970609 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a47965818c120d24472f815af78bc06a0f8d2831a7fa5f2cdcc062c66cda1db -address /run/k3s/containerd/containerd.sock
           ├─3972030 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id f481dc15bfb2c693f17a758bc0ddc4987b640973b41e475dd124489137571691 -address /run/k3s/containerd/containerd.sock
           ├─3972242 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ed7569406f0e02fd72b1bd05713a4b077a0057f2baeff7ff14754d1cd46cdd8e -address /run/k3s/containerd/containerd.sock
           ├─3973829 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 4782c1409ecde8930cbfb71ebfe5dc22e4451e65212226274df3f7e2b7f19d6a -address /run/k3s/containerd/containerd.sock
           ├─3973993 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id a0be2e1153ae71823cf1f9ff2be9c8898fe34f03bb9a3a175e005eda6a0f4e77 -address /run/k3s/containerd/containerd.sock
           ├─3974508 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ff67ce7eb9d47b82bc93e41d77f0adc465266fee72c854dcc7f6f88d4907a736 -address /run/k3s/containerd/containerd.sock
           └─3975287 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id d58f102fc6297722b22b46da8c9cdf49de60a60e5366963cd6a652142db97f7c -address /run/k3s/containerd/containerd.sock

Mar 28 16:18:38 silverbox1 rke2[3968737]: time="2024-03-28T16:18:38-04:00" level=debug msg="Node silverbox1-3211dcb1 is not changing etcd status condition"
Mar 28 16:18:39 silverbox1 rke2[3968737]: time="2024-03-28T16:18:39-04:00" level=debug msg="Wrote ping"
Mar 28 16:18:44 silverbox1 rke2[3968737]: time="2024-03-28T16:18:44-04:00" level=debug msg="Wrote ping"
Mar 28 16:18:49 silverbox1 rke2[3968737]: time="2024-03-28T16:18:49-04:00" level=debug msg="Wrote ping"
Mar 28 16:18:53 silverbox1 rke2[3968737]: time="2024-03-28T16:18:53-04:00" level=debug msg="Node silverbox1-3211dcb1 is not changing etcd status condition"
Mar 28 16:18:54 silverbox1 rke2[3968737]: time="2024-03-28T16:18:54-04:00" level=debug msg="Wrote ping"
Mar 28 16:18:59 silverbox1 rke2[3968737]: time="2024-03-28T16:18:59-04:00" level=debug msg="Wrote ping"
Mar 28 16:19:04 silverbox1 rke2[3968737]: time="2024-03-28T16:19:04-04:00" level=debug msg="Wrote ping"
Mar 28 16:19:08 silverbox1 rke2[3968737]: time="2024-03-28T16:19:08-04:00" level=debug msg="Node silverbox1-3211dcb1 is not changing etcd status condition"
Mar 28 16:19:09 silverbox1 rke2[3968737]: time="2024-03-28T16:19:09-04:00" level=debug msg="Wrote ping"

Now when I try curl -k 192.168.1.110:6443 I get Client sent an HTTP request to an HTTPS server

brandond commented 7 months ago

I see rke2 server is listening on ::9345 My server is running on 192.168.1.110 on the LAN level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 203.0.113.254:9345: connect: connection refused"

Are the server and agent both on the same LAN? Did you set the --node-external-ip on the server to 203.0.113.254, or is that just the address that you used as the --server address when joining the agent to the cluster? The agent is using that as the server address, and is unable to connect to it. Do you have any idea why that might be?

sudo ip addr add 203.0.113.254/31 dev dummy0 sudo ip route add default via 203.0.113.255 dev dummy0 metric 1000

Wait, you added a dummy/blackhole route to the server? No wonder the agent can't reach it. What exactly are you trying to do here?

You need a default route for kube-proxy's NAT rules to work properly, but you definitely can't black-hole traffic between nodes by sending it to the dummy interface.

brandond commented 7 months ago

Oh, I think I understand what you're trying to do. You're attempting to follow the instructions for a full-airgap single-node environment, where the node has no connected physical interfaces. That is not your situation. What you're doing causes the nodes to pick their dummy interfaces as their primary interface, and use that for inter-node traffic, because that's the interface with the default route. That of course does not work because it's a dummy interface that goes nowhere.

I would probably just configure a default route that points to an unused IP on the network segment shared by the nodes. That should cause the various networking components to chose that interface, and allow kube-proxy to work properly.

kurborg commented 7 months ago

Yes that's exactly what I was doing.

So the default route I should configure shouldn't be the IP of the primary node or an agent node but rather and unused IP on the network. I will give that a shot!

sudo ip addr del 203.0.113.254/31 dev dummy0
sudo ip route del default via 203.0.113.255 dev dummy0
sudo ip link set dummy0 down
sudo ip link delete dummy0

sudo ip route add default via 192.168.1.150 dev rke2-lan

My /etc/rancher/rke2/config.yaml on the agents looks like this, is this valid?

insecure-skip-tls-verify: true # is this even needed or just on the server node's config?
server: https://192.168.1.110:9345 # IP address of the server node
token: <token from server node>
brandond commented 7 months ago

That looks good.

The issue is that servers advertise their primary and external IP address to clients, and they will try to connect to those addresses directly, once they have bootstrapped into the cluster via the --server address. If the primary IP is on the dummy interface, that will fail.

kurborg commented 7 months ago

Hey Brandon,

So since rke2-lan isn't a physical interface I created one with

ip link add rke2lan type dummy
ip link set rke2lan up
ip addr add 192.168.1.150 dev rke2lan #Unused IP address on my subnet
ip route add default via 192.168.1.150 dev rke2lan

But now my main rke2-server.service is getting errors. Am I just supposed to setup this ip route on the agent nodes?

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2024-03-28 16:01:39 EDT; 3 days ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 3968734 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 3968732 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 3968729 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 3968737 (rke2)
    Tasks: 321
   Memory: 7.6G
   CGroup: /system.slice/rke2-server.service
           ├─3968737 /usr/local/bin/rke2 server
           ├─3968760 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─3969175 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox1 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=203.0.113.254 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
           ├─3969272 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id adce39022def66759fff9a1cbc7bac98ccf3ca49aa4da57e335f7695ba663284 -address /run/k3s/containerd/containerd.sock
           ├─3969372 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 063046cb319a554dfe131b24964374b271de07af3f52eca103d690bbab0e7cdf -address /run/k3s/containerd/containerd.sock
           ├─3969527 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id c82b4b7265a8240e7f7d7bcabfbc6627f1305c015c05e7fe8abd236d5b828c44 -address /run/k3s/containerd/containerd.sock
           ├─3969528 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8cd025fdcc33d93f73e5a16155692addacbf4da46a89ef01058749392c924b0a -address /run/k3s/containerd/containerd.sock
           ├─3969715 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a7fd55b9b05fb53893a0c76565651bc2fbe7cb7dcb64643aaf7f9862fdb1db1 -address /run/k3s/containerd/containerd.sock
           ├─3969937 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id e63c3fc79a74052b5eb3e9cc461f2049e68b4b177b5040c0fd8a4f13187422e0 -address /run/k3s/containerd/containerd.sock
           ├─3970609 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a47965818c120d24472f815af78bc06a0f8d2831a7fa5f2cdcc062c66cda1db -address /run/k3s/containerd/containerd.sock
           ├─3972030 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id f481dc15bfb2c693f17a758bc0ddc4987b640973b41e475dd124489137571691 -address /run/k3s/containerd/containerd.sock
           ├─3972242 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ed7569406f0e02fd72b1bd05713a4b077a0057f2baeff7ff14754d1cd46cdd8e -address /run/k3s/containerd/containerd.sock
           ├─3973829 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 4782c1409ecde8930cbfb71ebfe5dc22e4451e65212226274df3f7e2b7f19d6a -address /run/k3s/containerd/containerd.sock
           ├─3973993 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id a0be2e1153ae71823cf1f9ff2be9c8898fe34f03bb9a3a175e005eda6a0f4e77 -address /run/k3s/containerd/containerd.sock
           ├─3974508 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ff67ce7eb9d47b82bc93e41d77f0adc465266fee72c854dcc7f6f88d4907a736 -address /run/k3s/containerd/containerd.sock
           └─3975287 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id d58f102fc6297722b22b46da8c9cdf49de60a60e5366963cd6a652142db97f7c -address /run/k3s/containerd/containerd.sock

Apr 01 10:26:05 silverbox1 rke2[3968737]: {"level":"warn","ts":"2024-04-01T10:26:05.504117-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ed4540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 203.0.113.254:2379: i/o timeout\""}
Apr 01 10:26:05 silverbox1 rke2[3968737]: time="2024-04-01T10:26:05-04:00" level=error msg="Failed to get recorded learner progress from etcd: context deadline exceeded"
Apr 01 10:26:17 silverbox1 rke2[3968737]: {"level":"warn","ts":"2024-04-01T10:26:17.833253-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ed4540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 203.0.113.254:2379: i/o timeout\""}
Apr 01 10:26:17 silverbox1 rke2[3968737]: {"level":"info","ts":"2024-04-01T10:26:17.833351-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
Apr 01 10:26:20 silverbox1 rke2[3968737]: {"level":"warn","ts":"2024-04-01T10:26:20.504821-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ed4540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 203.0.113.254:2379: i/o timeout\""}
Apr 01 10:26:20 silverbox1 rke2[3968737]: time="2024-04-01T10:26:20-04:00" level=error msg="Failed to get recorded learner progress from etcd: context deadline exceeded"
Apr 01 10:26:32 silverbox1 rke2[3968737]: {"level":"warn","ts":"2024-04-01T10:26:32.835183-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ed4540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 203.0.113.254:2379: i/o timeout\""}
Apr 01 10:26:32 silverbox1 rke2[3968737]: {"level":"info","ts":"2024-04-01T10:26:32.835299-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/client.go:210","msg":"Auto sync endpoints failed.","error":"context deadline exceeded"}
Apr 01 10:26:35 silverbox1 rke2[3968737]: {"level":"warn","ts":"2024-04-01T10:26:35.504983-0400","logger":"etcd-client","caller":"v3@v3.5.9-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000ed4540/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 203.0.113.254:2379: i/o timeout\""}
Apr 01 10:26:35 silverbox1 rke2[3968737]: time="2024-04-01T10:26:35-04:00" level=error msg="Failed to get recorded learner progress from etcd: context deadline exceeded"
kurborg commented 7 months ago

Since I have a physical ethernet connection on the nodes, both server and agents. Should I just set the default route to that connection which is just the ip address of that respective node on the LAN? Forgive me I am trying to educate myself on this at the same time as implementing it so maybe I'm not understanding completely

Thank you for your patience and guidance on this

kurborg commented 7 months ago

@brandond Hey Brandon, so I've been trying some things and I tried to set a default route for the ethernet connection I have on the LAN. Note that I also had to turn back on the firewalld to allow for harbor to be deployed but I opened all the ports I suspect rke2 to be running on

firewall-cmd --permanent --add-port=8472/udp
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=10250/tcp
systemctl restart firewalld
firewall-cmd --list-ports

I'm getting errors for tls bad certificate and this server is not a member of the etcd cluster... any ideas? anything is helpful

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; disabled; vendor preset: disabled)
   Active: activating (start) since Wed 2024-04-03 14:36:36 EDT; 2min 25s ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 3802937 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill (code=exited, status=0/SUCCESS)
  Process: 3802966 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 3802965 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 3802963 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 3802967 (rke2)
    Tasks: 232
   Memory: 7.7G
   CGroup: /system.slice/rke2-server.service
           ├─2430798 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id b9aa7f3f43583a933f6e68371adb8d630ed5c10b7a74f38d1b20ce2add051727 -address /run/k3s/containerd/containerd.sock
           ├─3802967 /usr/local/bin/rke2 server
           ├─3803014 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─3803440 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox1 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=192.168.1.110 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
           ├─3969527 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id c82b4b7265a8240e7f7d7bcabfbc6627f1305c015c05e7fe8abd236d5b828c44 -address /run/k3s/containerd/containerd.sock
           ├─3969528 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8cd025fdcc33d93f73e5a16155692addacbf4da46a89ef01058749392c924b0a -address /run/k3s/containerd/containerd.sock
           ├─3969715 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a7fd55b9b05fb53893a0c76565651bc2fbe7cb7dcb64643aaf7f9862fdb1db1 -address /run/k3s/containerd/containerd.sock
           ├─3970609 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a47965818c120d24472f815af78bc06a0f8d2831a7fa5f2cdcc062c66cda1db -address /run/k3s/containerd/containerd.sock
           ├─3972030 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id f481dc15bfb2c693f17a758bc0ddc4987b640973b41e475dd124489137571691 -address /run/k3s/containerd/containerd.sock
           ├─3972242 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ed7569406f0e02fd72b1bd05713a4b077a0057f2baeff7ff14754d1cd46cdd8e -address /run/k3s/containerd/containerd.sock
           ├─3973829 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 4782c1409ecde8930cbfb71ebfe5dc22e4451e65212226274df3f7e2b7f19d6a -address /run/k3s/containerd/containerd.sock
           ├─3973993 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id a0be2e1153ae71823cf1f9ff2be9c8898fe34f03bb9a3a175e005eda6a0f4e77 -address /run/k3s/containerd/containerd.sock
           ├─3974508 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ff67ce7eb9d47b82bc93e41d77f0adc465266fee72c854dcc7f6f88d4907a736 -address /run/k3s/containerd/containerd.sock
           └─3975287 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id d58f102fc6297722b22b46da8c9cdf49de60a60e5366963cd6a652142db97f7c -address /run/k3s/containerd/containerd.sock

Apr 03 14:38:51 silverbox1 rke2[3802967]: time="2024-04-03T14:38:51-04:00" level=warning msg="Failed to list nodes with etcd role: runtime core not ready"
Apr 03 14:38:51 silverbox1 rke2[3802967]: time="2024-04-03T14:38:51-04:00" level=info msg="Cluster-Http-Server 2024/04/03 14:38:51 http: TLS handshake error from 127.0.0.1:48040: remote error: tls: bad certificate"
Apr 03 14:38:51 silverbox1 rke2[3802967]: time="2024-04-03T14:38:51-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Apr 03 14:38:52 silverbox1 rke2[3802967]: time="2024-04-03T14:38:52-04:00" level=info msg="Defragmenting etcd database"
Apr 03 14:38:52 silverbox1 rke2[3802967]: time="2024-04-03T14:38:52-04:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [silverbox1-3211dcb1=https://203.0.113.254:2380], expect: silverbox1-3211dcb1=https://192.168.1.110:2380"
Apr 03 14:38:56 silverbox1 rke2[3802967]: time="2024-04-03T14:38:56-04:00" level=debug msg="Wrote ping"
Apr 03 14:38:56 silverbox1 rke2[3802967]: time="2024-04-03T14:38:56-04:00" level=info msg="Cluster-Http-Server 2024/04/03 14:38:56 http: TLS handshake error from 127.0.0.1:48198: remote error: tls: bad certificate"
Apr 03 14:38:56 silverbox1 rke2[3802967]: time="2024-04-03T14:38:56-04:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: https://127.0.0.1:9345/v1-rke2/readyz: 500 Internal Server Error"
Apr 03 14:38:57 silverbox1 rke2[3802967]: time="2024-04-03T14:38:57-04:00" level=info msg="Defragmenting etcd database"
Apr 03 14:38:57 silverbox1 rke2[3802967]: time="2024-04-03T14:38:57-04:00" level=info msg="Failed to test data store connection: this server is a not a member of the etcd cluster. Found [silverbox1-3211dcb1=https://203.0.113.254:2380], expect: silverbox1-3211dcb1=https://192.168.1.110:2380"
brandond commented 7 months ago

this server is a not a member of the etcd cluster. Found [silverbox1-3211dcb1=https://203.0.113.254:2380], expect: silverbox1-3211dcb1=https://192.168.1.110:2380"

You changed the node IP. You can't change the node IP on etcd cluster members, you need to do a --cluster-reset to reset the etcd cluster back to a single member with the current IP, or delete the node from the cluster and rejoin it with the current IP - depending whether you have other cluster members up and running.

kurborg commented 7 months ago

Yes you were definitely right @brandond

Now that I've set the default routes to be the eno1 which corresponds to the IP address of each respective node in the LAN, it seems to be working for the server.. but getting what seems to be an endless hang on the rke2-agent with no error commands, progress? Any ideas on what I can look at next

running kubectl get nodes on the master node just returns the rke2-server but not the agent I'm attempting to add

● rke2-server.service - Rancher Kubernetes Engine v2 (server)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; disabled; vendor preset: disabled)
   Active: active (running) since Wed 2024-04-03 15:04:04 EDT; 7h ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 3817572 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-server.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill (code=exited, status=0/SUCCESS)
  Process: 3821995 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 3821994 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 3821992 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 3821996 (rke2)
    Tasks: 179
   Memory: 7.4G
   CGroup: /system.slice/rke2-server.service
           ├─3821996 /usr/local/bin/rke2 server
           ├─3822213 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           ├─3822625 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox1 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=192.168.1.110 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
           ├─3822705 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 09bd7cb4c6633227d2d4e63cad1bb5c966a2e953921dbb4489f5b21394ee162a -address /run/k3s/containerd/containerd.sock
           ├─3823145 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id ae95b7561d9fcb9890964d714a46683746f95972149dc4fe3285415628c572a6 -address /run/k3s/containerd/containerd.sock
           ├─3824055 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 764751fd021e05a8e7a8b6b8c364dd8140d8d83d654a137953b128586bd86866 -address /run/k3s/containerd/containerd.sock
           ├─3969527 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id c82b4b7265a8240e7f7d7bcabfbc6627f1305c015c05e7fe8abd236d5b828c44 -address /run/k3s/containerd/containerd.sock
           └─3969715 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 8a7fd55b9b05fb53893a0c76565651bc2fbe7cb7dcb64643aaf7f9862fdb1db1 -address /run/k3s/containerd/containerd.sock

Apr 03 22:55:33 silverbox1 rke2[3821996]: time="2024-04-03T22:55:33-04:00" level=debug msg="Node silverbox1-6fd13853 is not changing etcd status condition"
Apr 03 22:55:35 silverbox1 rke2[3821996]: time="2024-04-03T22:55:35-04:00" level=debug msg="Wrote ping"
Apr 03 22:55:40 silverbox1 rke2[3821996]: time="2024-04-03T22:55:40-04:00" level=debug msg="Wrote ping"
Apr 03 22:55:45 silverbox1 rke2[3821996]: time="2024-04-03T22:55:45-04:00" level=debug msg="Wrote ping"
Apr 03 22:55:48 silverbox1 rke2[3821996]: time="2024-04-03T22:55:48-04:00" level=debug msg="Node silverbox1-6fd13853 is not changing etcd status condition"
Apr 03 22:55:50 silverbox1 rke2[3821996]: time="2024-04-03T22:55:50-04:00" level=debug msg="Wrote ping"
Apr 03 22:55:55 silverbox1 rke2[3821996]: time="2024-04-03T22:55:55-04:00" level=debug msg="Wrote ping"
Apr 03 22:56:00 silverbox1 rke2[3821996]: time="2024-04-03T22:56:00-04:00" level=debug msg="Wrote ping"
Apr 03 22:56:03 silverbox1 rke2[3821996]: time="2024-04-03T22:56:03-04:00" level=debug msg="Node silverbox1-6fd13853 is not changing etcd status condition"
Apr 03 22:56:05 silverbox1 rke2[3821996]: time="2024-04-03T22:56:05-04:00" level=debug msg="Wrote ping"
● rke2-agent.service - Rancher Kubernetes Engine v2 (agent)
   Loaded: loaded (/usr/local/lib/systemd/system/rke2-agent.service; disabled; vendor preset: disabled)
   Active: activating (start) since Wed 2024-04-03 22:41:53 EDT; 15min ago
     Docs: https://github.com/rancher/rke2#readme
  Process: 986990 ExecStopPost=/bin/sh -c systemd-cgls /system.slice/rke2-agent.service | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill (code=exited, status=0/SUCCESS)
  Process: 987000 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
  Process: 986999 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
  Process: 986997 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
 Main PID: 987001 (rke2)
    Tasks: 84
   Memory: 4.6G
   CGroup: /system.slice/rke2-agent.service
           ├─983206 /var/lib/rancher/rke2/data/v1.26.14-rke2r1-b13f824cc417/bin/containerd-shim-runc-v2 -namespace k8s.io -id 9823f27371e7b153726ccb89a034a6f2159d09509c444271b6ae55eb3e93cd8e -address /run/k3s/containerd/containerd.sock
           ├─987001 /usr/local/bin/rke2 agent
           ├─987019 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
           └─987077 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox2 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --node-ip=192.168.1.111 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/var/lib/rancher/rke2/agent/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key

Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Imported docker.io/rancher/mirrored-ingress-nginx-kube-webhook-certgen:v20230312-helm-chart-4.5.2-28-g66a760794"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Imported docker.io/rancher/rke2-cloud-provider:v1.26.3-build20230406"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Imported docker.io/rancher/mirrored-sig-storage-snapshot-validation-webhook:v6.2.2"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Imported docker.io/rancher/hardened-flannel:v0.24.2-build20240122"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Imported images from /var/lib/rancher/rke2/agent/images/rke2-images.linux-amd64.tar in 22.797861029s"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Getting list of apiserver endpoints from server"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Updated load balancer rke2-agent-load-balancer default server address -> 192.168.1.110:9345"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Connecting to proxy" url="wss://192.168.1.110:9345/v1-rke2/connect"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=silverbox2 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-ip=192.168.1.111 --node-labels= --pod-infra-container-image=index.docker.io/rancher/pause:3.6 --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/var/lib/rancher/rke2/agent/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key"
Apr 03 22:42:17 silverbox2 rke2[987001]: time="2024-04-03T22:42:17-04:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack-tcp-timeout-established=0s --healthz-bind-address=127.0.0.1 --hostname-override=silverbox2 --kubeconfig=/var/lib/rancher/rke2/agent/kubeproxy.kubeconfig --proxy-mode=iptables"
kurborg commented 7 months ago

@brandond

Both Nodes are up and running now!!! I had turned back on the firewalld on the rke2-server node to allow for a docker-compose application to work (it isn't reachable unless I have firewalld going) so switching that back off, then restarting the rke2-server, then restarting the rke2-agent seemed to have worked.

Thank you very much for your help in debugging this, I appreciate your quick responses