Open adamfowleruk opened 8 months ago
This issue also prevents from successfully getting initialized on OpenShift where 127.0.0.1 is not allowed, but using one of the IP address of the node itself does work when using the NodePort.
╭─carroarmato0@neon in ~/Downloads took 184ms
╰─λ oc -n zarf get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
injector 1/1 Running 0 5m20s 10.128.0.145 worker-0 <none> <none>
zarf-docker-registry-6cb7547597-rwm6l 0/1 ImagePullBackOff 0 3m54s 10.130.0.68 worker-2 <none> <none>
╭─carroarmato0@neon in ~/Downloads took 175ms
╰─λ oc -n zarf get pod zarf-docker-registry-6cb7547597-rwm6l -o yaml | grep "image:"
image: 127.0.0.1:32189/library/registry:2.8.3
- image: 127.0.0.1:32189/library/registry:2.8.3
╭─carroarmato0@neon in ~/Downloads took 222ms
╰─λ oc -n zarf get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
zarf-docker-registry NodePort 172.30.166.15 <none> 5000:31999/TCP 5m35s
zarf-injector NodePort 172.30.67.60 <none> 5000:32189/TCP 5m49s
╭─carroarmato0@neon in ~/Downloads took 193ms
╰─λ oc get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker-0 Ready control-plane,master,worker 34d v1.27.10+28ed2d7 198.19.0.10 <none> Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow) 5.14.0-284.52.1.el9_2.x86_64 cri-o://1.27.3-2.rhaos4.14.git03502b6.el9
worker-1 Ready control-plane,master,worker 34d v1.27.10+28ed2d7 198.19.0.11 <none> Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow) 5.14.0-284.52.1.el9_2.x86_64 cri-o://1.27.3-2.rhaos4.14.git03502b6.el9
worker-2 Ready control-plane,master,worker 34d v1.27.10+28ed2d7 198.19.0.12 <none> Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow) 5.14.0-284.52.1.el9_2.x86_64 cri-o://1.27.3-2.rhaos4.14.git03502b6.el9
╭─carroarmato0@neon in ~/Downloads took 4m43s
╰─λ oc debug node/worker-2
Starting pod/worker-2-debug-htmnb ...
To use host binaries, run `chroot /host`
Pod IP: 198.19.0.12
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# sudo su -
Last login: Fri Apr 26 09:09:13 UTC 2024
[root@worker-2 ~]# curl -v 127.0.0.1:32189/v2/
* Trying 127.0.0.1:32189...
^C
[root@worker-2 ~]# curl -v 198.19.0.10:32189/v2/
* Trying 198.19.0.10:32189...
* Connected to 198.19.0.10 (198.19.0.10) port 32189 (#0)
> GET /v2/ HTTP/1.1
> Host: 198.19.0.10:32189
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: tiny-http (Rust)
< Date: Fri, 26 Apr 2024 09:15:04 GMT
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Content-Length: 2
<
* Connection #0 to host 198.19.0.10 left intact
[root@worker-2 ~]# iptables -nvL -t nat | grep 32189
79 4740 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL tcp dpt:32189 to:172.30.67.60:5000
i have the same issue on an actual k3s cluster with 4 nodes
This seems like it is a problem for multiple situations, OpenShift being one and CNIs using IPVS being the other. This could be fixed by having the mutating webhook use the node IP of the node which the pod is scheduled to. This should in theory be pretty simple to do.
If something happens to the node the pod will be deleted and rescheduled which would cause the new pod to be mutated again. Meaning we do not have to worry about the IP changing during the lifetime of the pod.
To clarify you cannot use k8s service definitions because the entity calling for the image, the CRI--typically containerd, sits outside of the cluster at the node level, the same reason you can't just assume TLS trust, because you would have to modify the node TLS trust chain. What happens under the hood is the traffic still routes through to the right node via k8s if you hit a node port on 127.0.0.1 via kube-proxy (typically). Known outliers are ipvs due to their stance on localhost and openshift policies.
We have seen deployments on Openshift (I am not an expert here), and were told they had to change some policy.
@phillebaba's suggestion is very reasonable except that the use of 127.0.0.1 takes advantage of a unique posture of containerd (and I think CRI-O) that allows 127.0.0.1 to be unencrypted. By using the nodeip you solve this issue but now break tls as containerd will require that. This part gets a little fuzzy though and needs validation because containerd has changed (and broke) this behavior for us several times in the past. It's possible the new fallback logic in the past containerd bug fix might actually ignore it being localhost or not--though that certainly wasn't the original design intent from docker days and might be seen as a new bug by some now if that is the case. Btw I did experiment with the new config pattern for containerd to inject tls certs and while it's still scary and requires mounting a containerd config path in a daemonset to override, I believe it now supports hot reloading and can be done without touching global config so is worth playing with again too.
Environment
Device and OS: NUC Extreme 12 Ubuntu 23.10.1 App version: zarf-init v0.32.5 Kubernetes distro being used: Microk8s v1.29.2-strict (Ubuntu strict snap confinement, and cis-hardening addon applied in Microk8s) Other:
Steps to reproduce
Expected result
zarf-init registry rollout should succeed. The zarf registry should be deployed with the correct IP gained from zarf-init doing a kubectl pod describe and extracting the internal IP address. (Which in Microk8s should actually be an internal Calico IP on 10.2.x.y/16, not a physical IP - localhost or otherwise).
Actual Result
Logs from the registry pod show that it is trying to communicate with a zarf component on 127.0.0.1 - This is not possible as that's not the IP that the pod is accessible on. (Or NodePort exposed to). The correct port is exposed as a NodePort, just not on that IP.
(I note there is another issue logged about the security implications of using NodePort instead of services here, which I agree with.)
Visual Proof (screenshots, videos, text, etc)
None yet. Will provide when I can extract it from the box...
Severity/Priority
High - Blocks automated offline airgapped installation on Microk8s with strict confinement and cis-hardening applied. (Highly likely in Gov).
Additional Context
Add any other context or screenshots about the technical debt here.
None.