Cannot zarf init on microk8s due to 127.0.0.1 IP hardcoding

adamfowleruk commented 8 months ago

Environment

Device and OS: NUC Extreme 12 Ubuntu 23.10.1 App version: zarf-init v0.32.5 Kubernetes distro being used: Microk8s v1.29.2-strict (Ubuntu strict snap confinement, and cis-hardening addon applied in Microk8s) Other:

Steps to reproduce

After microk8s install and default SC configured...
microk8s config > ~/.kube/config
zarf init --components git-server --confirm (From the same folder with zarf-init-amd64-v0.32.5.tar.xst and sbom present)
Observe zarf deploy the webhook, but pause when waiting for the registry to be available

Expected result

zarf-init registry rollout should succeed. The zarf registry should be deployed with the correct IP gained from zarf-init doing a kubectl pod describe and extracting the internal IP address. (Which in Microk8s should actually be an internal Calico IP on 10.2.x.y/16, not a physical IP - localhost or otherwise).

Actual Result

Logs from the registry pod show that it is trying to communicate with a zarf component on 127.0.0.1 - This is not possible as that's not the IP that the pod is accessible on. (Or NodePort exposed to). The correct port is exposed as a NodePort, just not on that IP.

(I note there is another issue logged about the security implications of using NodePort instead of services here, which I agree with.)

Visual Proof (screenshots, videos, text, etc)

None yet. Will provide when I can extract it from the box...

Severity/Priority

High - Blocks automated offline airgapped installation on Microk8s with strict confinement and cis-hardening applied. (Highly likely in Gov).

Additional Context

Add any other context or screenshots about the technical debt here.

None.

carroarmato0 commented 7 months ago

This issue also prevents from successfully getting initialized on OpenShift where 127.0.0.1 is not allowed, but using one of the IP address of the node itself does work when using the NodePort.

 ╭─carroarmato0@neon in ~/Downloads took 184ms
 ╰─λ oc -n zarf get pods -o wide
NAME                                    READY   STATUS             RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
injector                                1/1     Running            0          5m20s   10.128.0.145   worker-0   <none>           <none>
zarf-docker-registry-6cb7547597-rwm6l   0/1     ImagePullBackOff   0          3m54s   10.130.0.68    worker-2   <none>           <none>

 ╭─carroarmato0@neon in ~/Downloads took 175ms
 ╰─λ oc -n zarf get pod zarf-docker-registry-6cb7547597-rwm6l -o yaml | grep "image:"
    image: 127.0.0.1:32189/library/registry:2.8.3
  - image: 127.0.0.1:32189/library/registry:2.8.3

 ╭─carroarmato0@neon in ~/Downloads took 222ms
 ╰─λ oc -n zarf get svc
NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
zarf-docker-registry   NodePort   172.30.166.15   <none>        5000:31999/TCP   5m35s
zarf-injector          NodePort   172.30.67.60    <none>        5000:32189/TCP   5m49s

 ╭─carroarmato0@neon in ~/Downloads took 193ms
 ╰─λ oc get nodes -o wide
NAME       STATUS   ROLES                         AGE   VERSION            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
worker-0   Ready    control-plane,master,worker   34d   v1.27.10+28ed2d7   198.19.0.10   <none>        Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow)   5.14.0-284.52.1.el9_2.x86_64   cri-o://1.27.3-2.rhaos4.14.git03502b6.el9
worker-1   Ready    control-plane,master,worker   34d   v1.27.10+28ed2d7   198.19.0.11   <none>        Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow)   5.14.0-284.52.1.el9_2.x86_64   cri-o://1.27.3-2.rhaos4.14.git03502b6.el9
worker-2   Ready    control-plane,master,worker   34d   v1.27.10+28ed2d7   198.19.0.12   <none>        Red Hat Enterprise Linux CoreOS 414.92.202402051952-0 (Plow)   5.14.0-284.52.1.el9_2.x86_64   cri-o://1.27.3-2.rhaos4.14.git03502b6.el9

 ╭─carroarmato0@neon in ~/Downloads took 4m43s
 ╰─λ oc debug node/worker-2
Starting pod/worker-2-debug-htmnb ...
To use host binaries, run `chroot /host`
Pod IP: 198.19.0.12
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# sudo su -
Last login: Fri Apr 26 09:09:13 UTC 2024

[root@worker-2 ~]# curl -v 127.0.0.1:32189/v2/
*   Trying 127.0.0.1:32189...
^C

[root@worker-2 ~]# curl -v 198.19.0.10:32189/v2/
*   Trying 198.19.0.10:32189...
* Connected to 198.19.0.10 (198.19.0.10) port 32189 (#0)
> GET /v2/ HTTP/1.1
> Host: 198.19.0.10:32189
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: tiny-http (Rust)
< Date: Fri, 26 Apr 2024 09:15:04 GMT
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Content-Length: 2
< 
* Connection #0 to host 198.19.0.10 left intact

[root@worker-2 ~]# iptables -nvL -t nat | grep 32189
   79  4740 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            ADDRTYPE match dst-type LOCAL tcp dpt:32189 to:172.30.67.60:5000

christianhuening commented 6 months ago

i have the same issue on an actual k3s cluster with 4 nodes

phillebaba commented 4 months ago

This seems like it is a problem for multiple situations, OpenShift being one and CNIs using IPVS being the other. This could be fixed by having the mutating webhook use the node IP of the node which the pod is scheduled to. This should in theory be pretty simple to do.

If something happens to the node the pod will be deleted and rescheduled which would cause the new pod to be mutated again. Meaning we do not have to worry about the IP changing during the lifetime of the pod.

jeff-mccoy commented 2 months ago

To clarify you cannot use k8s service definitions because the entity calling for the image, the CRI--typically containerd, sits outside of the cluster at the node level, the same reason you can't just assume TLS trust, because you would have to modify the node TLS trust chain. What happens under the hood is the traffic still routes through to the right node via k8s if you hit a node port on 127.0.0.1 via kube-proxy (typically). Known outliers are ipvs due to their stance on localhost and openshift policies.

We have seen deployments on Openshift (I am not an expert here), and were told they had to change some policy.

@phillebaba's suggestion is very reasonable except that the use of 127.0.0.1 takes advantage of a unique posture of containerd (and I think CRI-O) that allows 127.0.0.1 to be unencrypted. By using the nodeip you solve this issue but now break tls as containerd will require that. This part gets a little fuzzy though and needs validation because containerd has changed (and broke) this behavior for us several times in the past. It's possible the new fallback logic in the past containerd bug fix might actually ignore it being localhost or not--though that certainly wasn't the original design intent from docker days and might be seen as a new bug by some now if that is the case. Btw I did experiment with the new config pattern for containerd to inject tls certs and while it's still scary and requires mounting a containerd config path in a daemonset to override, I believe it now supports hot reloading and can be done without touching global config so is worth playing with again too.

zarf-dev / zarf