jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.57k stars 390 forks source link

Support podman in k8s #1531

Closed sgaist closed 1 year ago

sgaist commented 2 years ago

This pull request is a refactor that allows to use alternatives to Docker as image builder as requested in jupyterhub/binderhub#1513. In this case, support for Podman is implemented.

Podman is daemonless open source alternative to Docker that can be used as drop-in replacement in many cases. Aside from working directly on the command line, it can run as as service (only on Linux at the time of writing) offering a Docker compatible API that can be used in a a similar fashion. This service mode is what is used in this PR to implement the image build support.

The PR implements a rewrite of the DaemonSet providing the docker image build service.

The new imageBuilderType parameter introduced will let users select which image builder to use:

The daemonset name pink is the acronym of "podman in kubernetes".

Configuration example:

service:
  type: ClusterIP

config:
  BinderHub:
    debug: true
    hub_url: "https://my.host.com/jupyter"

    use_registry: true
    image_prefix: "my.registry.com/my-project/my-image-"

    build_docker_host: "unix:///local/fake_var/run/pink/docker.sock"
    build_image: "docker.io/sgaist/repo2podman:latest"

imageBuilderType: pink

pink:
  hostStorageDir: /local/fake_var/lib/pink/storage/
  hostSocketDir: /local/fake_var/run/pink/

imageCleaner:
  host:
    enabled: false

ingress:
  enabled: true
  ingressClassName: nginx
  hosts:
    - my.host.com

One advantage of this new implementation is that it should allow the addition of new build services without requiring core modifications.

With the addition of Podman as builder option, BinderHub administrators have the option to either use the already provided repo2docker as Podman provides a Docker compatible API or repo2podman that is a plugin for repo2docker using the Podman client. Depending on the use case, some environment variables might be needed to configure it. Thus this patch adds support for them through the new extra_envs traits added to the KubernetesBuildExecutor.

As an example of its use:

03-Repo2Podman: |
    class Repo2PodmanBuild(KubernetesBuildExecutor):
        extra_envs = {
            "CONTAINER_HOST": "unix:///path/to/podman.sock",
            "REGISTRY_AUTH_FILE": "/path/to/.docker/config.json",
        }

        def get_r2d_cmd_options(self):
            return ["--engine=podman"] + super().get_r2d_cmd_options()

    c.BinderHub.build_class = Repo2PodmanBuild

This example shows a configuration that will enable the --remote option for Podman as well provide the path to the credentials to be used when pushing to the registry.

sgaist commented 2 years ago

Small note, as it is a preliminary work I have not written the documentation related to it yet.

sgaist commented 2 years ago

Following this month's JupyterHub's meeting, @manics and myself continued a bit on the podman front discussing the podman engine for repo2docker as well as this patch content. One thing that I was not aware of was that beside podman being a drop in replacement for docker in terms of command line interface, it now also implements a docker compatible API from a service point of view. Therefore I ran a BinderHub deployment test with the pink daemonset enabled, the standard Build class, and the image cleaner configured to watch the storage folder of podman.

All these together worked quite nicely.

This means that the environment variable addition is not strictly required for the goal of using podman in place of docker for the pod building the images. However it does allow people to choose whether they want to go full podman, full docker, or a mix of both as it would be the default.

manics commented 2 years ago

The diff between the dind and pink K8s templates is fairly small

diff --color -Nur helm-chart/binderhub/templates/dind/daemonset.yaml helm-chart/binderhub/templates/pink/daemonset.yaml
--- helm-chart/binderhub/templates/dind/daemonset.yaml  2021-09-18 20:57:59.971717855 +0100
+++ helm-chart/binderhub/templates/pink/daemonset.yaml  2022-09-23 13:29:38.484322020 +0100
@@ -1,20 +1,20 @@
-{{- if .Values.dind.enabled -}}
+{{- if .Values.pink.enabled -}}
 apiVersion: apps/v1
 kind: DaemonSet
 metadata:
-  name: {{ .Release.Name }}-dind
+  name: {{ .Release.Name }}-pink
 spec:
   updateStrategy:
     type: RollingUpdate
   selector:
     matchLabels:
-      name:  {{ .Release.Name }}-dind
+      name:  {{ .Release.Name }}-pink
   template:
     metadata:
       labels:
-        name: {{ .Release.Name }}-dind
+        name: {{ .Release.Name }}-pink
         app: binder
-        component: dind
+        component: dind  # Lying to build so current affinity rules can work
         release: {{ .Release.Name }}
         heritage: {{ .Release.Service }}
     spec:
@@ -28,47 +28,59 @@
         operator: Equal
         value: user
       nodeSelector: {{ .Values.config.BinderHub.build_node_selector | default dict | toJson }}
-      {{- with .Values.dind.initContainers }}
+      {{- with .Values.pink.initContainers }}
       initContainers:
         {{- . | toYaml | nindent 8 }}
       {{- end }}
+      {{- with .Values.pink.daemonset.image.pullSecrets }}
+      imagePullSecrets:
+        {{- . | toYaml | nindent 8 }}
+      {{- end }}
+      securityContext:
+        fsGroup: 1000
+
       containers:
-      - name: dind
-        image: {{ .Values.dind.daemonset.image.name }}:{{ .Values.dind.daemonset.image.tag }}
+      - name: pink
+        image: {{ .Values.pink.daemonset.image.name }}:{{ .Values.pink.daemonset.image.tag }}
+        imagePullPolicy: {{ .Values.pink.daemonset.image.pullPolicy }}
+
         resources:
-          {{- .Values.dind.resources | toYaml | nindent 10 }}
+          {{- .Values.pink.resources | toYaml | nindent 10 }}
         args:
-          - dockerd
-          - --storage-driver={{ .Values.dind.storageDriver }}
-          - -H unix://{{ .Values.dind.hostSocketDir }}/docker.sock
-          {{- with .Values.dind.daemonset.extraArgs }}
-          {{- . | toYaml | nindent 10 }}
-          {{- end }}
+          - podman
+          - system
+          - service
+          - --time=0
+          - unix:///var/run/pink/docker.sock
         securityContext:
           privileged: true
+          runAsUser: 1000  # podman default user id
+
         volumeMounts:
-        - name: dockerlib-dind
-          mountPath: /var/lib/docker
-        - name: rundind
-          mountPath: {{ .Values.dind.hostSocketDir }}
-        {{- with .Values.dind.daemonset.extraVolumeMounts }}
+        - mountPath: /home/podman/.local/share/containers/storage/
+          name: podman-containers
+        - mountPath: /var/run/pink/
+          name: podman-socket
+
+        {{- with .Values.pink.daemonset.extraVolumeMounts }}
         {{- . | toYaml | nindent 8 }}
         {{- end }}
-        {{- with .Values.dind.daemonset.lifecycle }}
+        {{- with .Values.pink.daemonset.lifecycle }}
         lifecycle:
           {{- . | toYaml | nindent 10 }}
         {{- end }}
       terminationGracePeriodSeconds: 30
       volumes:
-      - name: dockerlib-dind
+      - name: podman-containers
         hostPath:
-          path: {{ .Values.dind.hostLibDir }}
-          type: DirectoryOrCreate
-      - name: rundind
+          path: {{ .Values.pink.hostStorageDir }}
+          type: Directory
+      - name: podman-socket
         hostPath:
-          path: {{ .Values.dind.hostSocketDir }}
-          type: DirectoryOrCreate
-      {{- with .Values.dind.daemonset.extraVolumes }}
+          path: {{ .Values.pink.hostSocketDir }}
+          type: Directory
+
+      {{- with .Values.pink.daemonset.extraVolumes }}
       {{- . | toYaml | nindent 6 }}
       {{- end }}
 {{- end }}

so a couple of alternative options are:

Parameterise the dind templates:

Add conditionals to the dind templates, e.g.

        args:
       {{- if eq .Values.containerBuilderPod "dind" }}
          - dockerd
          - --storage-driver={{ .Values.dind.storageDriver }}
          - -H unix://{{ .Values.dind.hostSocketDir }}/docker.sock
       {{- end }}
       {{- if eq .Values.containerBuilderPod "pink" }}
          - podman
          - system
          - service
          - --time=0
          - unix:///var/run/pink/docker.sock
       {{- end }}
        {{- with .Values.dind.daemonset.extraArgs }}
        {{- . | toYaml | nindent 10 }}
        {{- end }}

If we decide on this route I think deprecating .Values.dind.enabled and replacing it with something like .Values.containerBuilderPod will be more extensible in the future.

sgaist commented 2 years ago

I like the idea of a configurable containerBuilderPod

Should we also consider modifying the dind label used to keep build pods on the same node that the builder pod itself ?

manics commented 2 years ago

Maybe we could do this in two PRs? This one for getting podman to work (so a configurable containerBuilderPod with no breaking changes). Then a separate PR for renaming the dind label to something more generic, since it's an internal breaking change- in fact a PR to change the label could be opened now, since it'll be a fairly minor conflict to resolve, and means others can agree/disagree!

sgaist commented 2 years ago

As per your excellent idea I merged the two templates into a "new one" that covers both cases and should be easy to extend further.

I haven't touched yet the securityContext part as I wanted to first ensure that the new template would fit the bill.

sgaist commented 1 year ago

@manics I think it's a good idea to do the Pod Security Policy work in a separate PR. With the replacement of them in K8s 1.25 for the Pod Security admission controller, it might be better to concentrate on the latter.

manics commented 1 year ago

@consideRatio Is anything else needed here before merging?

consideRatio commented 1 year ago

Thanks @manics for the ping, @manics are your unresolved comments ready be resolved, or do they need to be followed up further?

manics commented 1 year ago

When using pink with it's default settings it's also necessary to override the imageCleaner config.

imageCleaner:
  host:
    dockerSocket: /var/run/pink/podman.sock
    dockerLibDir: /var/lib/pink/storage

Since we're now making a breaking change in this PR could we get rid of these properties and use whatever is configured in dind/pink?

I've also just noticed dind uses hostLibDir for it's storage directory whereas pink uses hostStorageDir. @consideRatio do you think we should keep this as it represents the underlying container implementation, or use the same property name as an abstraction?

consideRatio commented 1 year ago

@consideRatio do you think we should keep this as it represents the underlying container implementation, or use the same property name as an abstraction?

Hmm hmm I'm feeling a bit out of my depth here and don't feel ready to opine about it. I think your judgement call will be acceptable no matter what though!

When using pink with it's default settings it's also necessary to override the imageCleaner config.

imageCleaner:
  host:
    dockerSocket: /var/run/pink/podman.sock
    dockerLibDir: /var/lib/pink/storage

Since we're now making a breaking change in this PR could we get rid of these properties and use whatever is configured in dind/pink?

But there was a local situation as well, or? Or is the imageCleaner stuff only activated as part of using dind/pink? I think if imageCleaner is tightly coupled to the use of dind/pink, then relying on configuration under there is the right call.

sgaist commented 1 year ago

Funny you bring imageCleaner on as I was about to ask some questions for it.

To the best of my comprehension, the default deployment for BinderHub would be using the host Docker and a host configured cleaner.

Then you have the option to use dind and that brings two cases to consider:

The former works because everything happens on the same node and thus all the paths are valid on the hosts. However there's a side effect. There are two containers created in the imageCleaner pod, one for the host and one for dind which, in the end, are checking the same volumes.

In the second case, only the container watching dind is created.

One thing we could do is refactor the cleaner template logic to follow a pattern similar to the builder daemonset and thus ensure that we have only one container doing the watching and cleanup.

One thing for that that we could do is to remove the loop in the containers part of the template.

Maybe rename local to host in the enumeration defined for imageBuilderType ?

consideRatio commented 1 year ago

Thanks for thinking about this @sgaist!

When using local, is image building done by the binderhub software in the binder pod - using a local docker daemon? Ideally, the configuration value should clarify that directly from its name. If we update this name, I'd appreciate if we make the name verbose enough to help a person understand more from the name directly.

I think local means binder_pod_local_docker_daemon and that the binder pod is mounted a docker daemon from the host or similar, do you think that is that correct @sgaist?

Ah, but I get your point about using host over local due to this:

https://github.com/jupyterhub/binderhub/blob/0049297e69dfbc93a6a1079ef85c2b898ebad192/helm-chart/binderhub/templates/image-cleaner.yaml#L38-L41

I think as long as we have imageCleaner's terminology of host, it makes sense to speak of host instead of local. If you want to change that would be fine in my mind.

Confusion points are:

Oh I get some clarity now! If a node has been running in k8s for a long time, and pulls more and more images because its starting more and more pods of different kinds - it can run out of space. So, image cleaning on the host is not relevant only for where you are building pods using some docker daemon. It can be directly relevant just by being a node that starts a lot of pods.

At the same time, we have this k8s official description, saying that its not recommended to have external tools to do such things: https://kubernetes.io/docs/concepts/architecture/garbage-collection/#containers-images. Hmmm... I think removal of the imageCleaner host option is something worth considering, but its beyond the scope of this PR to consider I think. I see that mybinder.org-deploy has imageCleaner.host.enabled=false.

EDIT: I opened https://github.com/jupyterhub/binderhub/issues/1584 for discussion about this to help us not need to resolve such matters in this PR.

consideRatio commented 1 year ago

@sgaist / @manics, I know ~nothing about Podman. Should it be exposed to image cleaning, and if so, can it be exposed to image cleaning by our current system running https://github.com/jupyterhub/docker-image-cleaner?

manics commented 1 year ago

For the purposes of building and storing images podman is the same as Docker- if you build an image it's going to stay around until it's deleted. The socket interface presented by podman is the same as docker, that's why I wasn't bothered about renaming the socket to podman.sock- as far as the client is concerned (Docker cli, docker-py, repo2docker) it shouldn't matter, except that if the socket isn't mounted to /var/run/docker.sock the client needs configuration to use the alternative path.

How about we only use imageCleaner.host.* for the local docker host, and if dind/pink are enabled ignore those values and use dind.{hostSocketDir,hostLibDir} / pink.{hostStorageDir,hostSocketDir} instead, since AFAICT there's no reason to run the cleaner with any other paths?

sgaist commented 1 year ago

I think we can simplify some things here indeed.

Do we have an agreement on the following for this patch:

?

consideRatio commented 1 year ago

Agreement from my end! Thanks Simon and Samuel!! / Erik on mobile

manics commented 1 year ago

Sounds fine to me as well, but given the length of this PR how does everyone feel about merging as is, and doing changes in a follow-up?

consideRatio commented 1 year ago

(Not to be blocking a merge as this can be done after merge)

manics commented 1 year ago

Merging!

manics commented 1 year ago

This broke the BinderHub CI upgrade test on main: https://github.com/jupyterhub/binderhub/blob/fb773a0891d0ed84ad397ec4eeabec7b79e2947b/.github/workflows/test.yml#L186

Run . ./ci/common

Installing already released binderhub version 1.0.0-0.dev.git.2920.hc64d689
using old config
Error: INSTALLATION FAILED: execution error at (binderhub/templates/NOTES.txt:194:4): 

#################################################################################
######   BREAKING: The config values passed contained no longer accepted    #####
######             options. See the messages below for more details.        #####
######                                                                      #####
######             To verify your updated config is accepted, you can use   #####
######             the `helm template` command.                             #####
#################################################################################

CHANGED:

dind:
  enabled: true

must as of version 0.3.0 be replace by

imageBuilderType: dind

On the bright side we've now tested the upgrade message :smiley:

consideRatio commented 1 year ago

Thanks for following up @manics!!

sgaist commented 1 year ago

Thanks for the reviews guys !

@consideRatio I have updated the description and I think it's now complete.

@manics can you double check it in case I missed something ?

consideRatio commented 1 year ago

@sgaist thanks! Can you provide a leading paragraph, one or two sentence, to introduce what podman is in the PR description as well?

Relevant for the paragraph:

sgaist commented 1 year ago

@consideRatio Sure thing ! Done in three sentences but I can rework that if needed.

consideRatio commented 1 year ago

Perfect @sgaist thanks, looks great!