jupyter-server / enterprise_gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
https://jupyter-enterprise-gateway.readthedocs.io/en/latest/
Other
615 stars 220 forks source link

KIP cannot use ImagePullSecret when using containerd #1359

Open Buchhold opened 5 months ago

Buchhold commented 5 months ago

Description

This started as a post asking for help in the community forum: https://discourse.jupyter.org/t/kip-containerd-and-imagepullsecrets/23342. It seems it might not be due to misconfiguration on my end, but a bug in EG.

Using a private repository and projects that require authentication, the KIP appears to not use the ImagePullSecret specified in the helm chart. Thus, pulling public images from the repo works, whereas private ones don't. The image puller logs show failure as:

[E 2024-01-11 09:53:42,617 kernel_image_puller.t1] Error executing crictl -r unix:///run/containerd/containerd.sock pull harbor.[...]: time="2024-01-11T09:53:42Z" level=fatal msg="pulling image: rpc error: code = Unknown desc = failed to pull and unpack image \"harbor[...]kernel-spark-python-custom:2024-01-09\": failed to resolve reference \"harbor.[...]kernel-spark-python-custom:2024-01-09\": pulling from host harbor[...] failed with status code [manifests 2024-01-09]: 401 Unauthorized"

This has a significant impact because pulling those images on kernel creation is also problematic. Unless sharing the namespace with EG, the secret resource is not available in the per-kernel namespaces that are created on launch.

Reproduce

What I configured:

global:
  rbac: true
  # ImagePullSecrets for a ServiceAccount, list of secrets in the same namespace
  # to use for pulling any images in pods that reference this ServiceAccount.
  # Must be set for any cluster configured with private docker registry.
  imagePullSecrets: 
    - harbor-registry
  commonLabels: {}
    # app.kubernetes.io/name: [your app name]

# You can optionally create imagePull Secrets
imagePullSecretsCreate:
  enabled: true
  annotations: {}
    # this annotatoin allows to keep secret even if helm release is deleted
    # "helm.sh/resource-policy": "keep"
  secrets: 
    - name: "harbor-registry"
      data: "eyJhdX... (base64 encoded dockerconfigjson)"
[...]
# Kernel Image Puller (daemonset)
kip:
  [...]
  criSocket: /run/containerd/containerd.sock

This put me in a position where:

My custom kernelspec image is successfully pulled from a private project from our harbor resgistry. The secret harbor-registey is created in the enterprise-gatewaynamespace by helm. kernel-image-puller-sa is created in the enterprise-gateway namespace and has the imagePullSecret: harbor-registry The logs of the kip pods show that our of 3 custom kernels in my kernelspec, all three are detected and should be pulled. The two from public projects in the registry are pulled successfully (no secret needed) but the one form the private project isn’t: (Error executing crictl -r unix:///run/containerd/containerd.sock pull [...] failed with status code [manifests 2024-01-09]: 401 Unauthorized") It seems the secret + sa work fine to pull pods when creating kubernetes resources (like the kernelspecs image), but are not used when the KIP pods try to pull via the docker client.

Expected behavior

KIP successfully pulls all images, not only those in projects that don't require auth.

Context

EG image: current elyra/enterprise-gateway:dev

welcome[bot] commented 5 months ago

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

Buchhold commented 5 months ago

I created a local solution. I am pretty new to Kubernetes and EG so I don't think this solution generalizes perfectly, but maybe you can point me in the right direction to making this a sensible pull request:

Steps I did:

  1. Modify the KIP daemonset template that is part of the enterprise gateway helm chart: 1.1 Mount the existing secrets (already known in the helm chart) as files in the KIP containers under a certain path.
  2. Modify the KIP code and rebuild the KIP image: 2.1 Open and parse all of the mounted secret files in the KIP application code (by iterating through all suitable files in the mount path) and create a map repo->auth 2,2 When pulling, if the image repo is a key in the map created in a), extend the crictl call by --auth (if available otherwise --creds {username]:{password}) right after the pull command.

Relevant Excerpts/Changes:

/etc/kubernetes/helm/enterprise-gateway/templates/daemonset.yaml

[...]
      volumeMounts:
        - name: cri-socket
          mountPath: !!str {{ .Values.kip.criSocket }}  # see env KIP_CRI_ENDPOINT
          readOnly: true
        {{- range .Values.global.imagePullSecrets }}
        - name: {{ printf "ips-%s" . }}
          mountPath: {{printf "/mnt/secrets/%s" . }}
          readOnly: true
        {{- end }}
      volumes:
      - name: cri-socket
        hostPath:
          path: {{ .Values.kip.criSocket }}
      {{- range .Values.global.imagePullSecrets }}
      - name: {{printf "ips-%s" .}}
        secret:
          secretName: {{ . }}
          optional: true
[...]

etc/docker/kernel-image-puller/kernel_image_puller.py:

@@ -42,6 +44,15 @@ class KernelImagePuller:
         self.policy = None
         self.image_fetcher = image_fetcher
         self.load_static_env_values()
+        self.load_image_pull_secrets()
+
+    def load_image_pull_secrets(self):
+        self.registry_to_auth = {}
+        # Todo: add support for more kinds of secrets
+        for f in glob.glob("/mnt/secrets/*/.dockerconfigjson"):
+            with open(f) as fh:
+                conf = json.load(fh)
+                self.registry_to_auth.update(conf.get("auths", {}))

     def load_static_env_values(self):
         """Load the static environment values."""
@@ -245,6 +262,7 @@ class KernelImagePuller:
         """Downloads (pulls) the named image using the configured container runtime."""
         result = True
         absolute_image_name = self.get_absolute_image_name(image_name)
+        registry = absolute_image_name.split("/")[0]
         t0 = time.time()
         if self.container_runtime == KernelImagePuller.DOCKER_CLIENT:
             try:
@@ -252,7 +270,18 @@ class KernelImagePuller:
             except NotFound:
                 result = False
         elif self.container_runtime == KernelImagePuller.CONTAINERD_CLIENT:
             argv = ["crictl", "-r", self.runtime_endpoint, "pull", absolute_image_name]
+            if registry in self.registry_to_auth:
+                auth = self.registry_to_auth[registry]
+                if authstr := auth.get("auth"):
+                    argv = [
+                        "crictl",
+                        "-r",
+                        self.runtime_endpoint,
+                        "pull",
+                        "--auth",
+                        authstr,
+                        absolute_image_name,
+                    ]
+                elif "username" in auth and "password" in auth:
+                    argv = [
+                        "crictl",
+                        "-r",
+                        self.runtime_endpoint,
+                        "pull",
+                        "--creds",
+                        f'{auth["username"]}:{auth["password"]}',
+                        absolute_image_name,
+                    ]
             result = self.execute_cmd(argv)
         else:  # invalid container runtime
             logger.error(f"Invalid container runtime detected: '{self.container_runtime}'!")

(edited to add a fallback for username:password if no "auth" str is available)