jupyterhub / zero-to-jupyterhub-k8s

Helm Chart & Documentation for deploying JupyterHub on Kubernetes
https://zero-to-jupyterhub.readthedocs.io
Other
1.56k stars 799 forks source link

Publish ARM64 Docker images #2119

Closed manics closed 3 years ago

manics commented 3 years ago

Proposed change

As described in https://discourse.jupyter.org/t/ztjh-on-a-raspberry-pi-k8s-cluster/3043/16 and https://github.com/sakuraiyuta/zero-to-jupyterhub-k8s/commit/d22290b2ad6b0c69a7f682ce7ddc9b2b98a6a473 by @sakuraiyuta it's possible to run Z2JH on ARM systems with some changes. I think we should look at supporting ARM64 in Z2JH.

Work on ARM64 started in TLJH (issue: https://github.com/jupyterhub/the-littlest-jupyterhub/issues/62, WIP PR from @yuvipanda https://github.com/jupyterhub/the-littlest-jupyterhub/pull/674).

Alternative options

Do nothing.

Who would use this feature?

AWS EC2 ARM64 users. Raspberry pi users.

Suggest a solution

This will require work on the Docker images and the GitHub workflows. Based on https://github.com/sakuraiyuta/zero-to-jupyterhub-k8s/commit/d22290b2ad6b0c69a7f682ce7ddc9b2b98a6a473 I think we could make the following changes:

Docker images

  1. Add a build-arg in all Dockerfiles to allow the architecture to be switched. For example, replace FROM ubuntu:20.04 with FROM $ARCH/ubuntu:20.04, see https://github.com/docker-library/official-images#architectures-other-than-amd64
  2. Allow chartpress to accept command line build arguments, e.g. chartpress --build-arg ARCH={amd64|arm64v8} mirroring Docker's --build-arg
  3. Figure out what to do with singleuser-sample/Dockerfile since jupyter/base-notebook doesn't support other architectures. e.g. we could build our own image from $ARCH/python, or perhaps ignore it since it's just an example.

Publish workflow

Option 1: Keep the existing build workflow, add a new workflow for building and publishing arm64v8 Docker images only (chartpress --push but not --publish) since the standard amd64 workflow will publish the chart.

Option 2: Split the build/push image and publish chart steps into separate jobs: an amd64 build image job, an arm64v8 build image job, and a chart publish job that only runs if both the amd64 and arm64v8 jobs succeed. This means the chart will only be published if Docker builds succeed for all supported architectures.

Test workflow

K3S supports ARM64 so we could duplicate the current test workflow for ARM64 if we wanted to. However since all ARM64 steps have to be run in a single action this requires either maintaining two separate test workflows, or refactoring the existing one to move most steps into a script. Alternatively we could keep things simple, ignore the tests to begin with, and just publish the images.

meeseeksmachine commented 3 years ago

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/ztjh-on-a-raspberry-pi-k8s-cluster/3043/18

manics commented 3 years ago

https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/ might make things easier. docker buildx should allow building for multiple architectures simultaneously under the same docker tag, and it might avoid the need to use https://github.com/uraimo/run-on-arch-action for building the images. It would require a changing chartpress https://github.com/jupyterhub/chartpress/blob/3b2dc0d1e414cbeda0a336012a2aa8eed5d2bf2d/chartpress.py#L272 to something like docker buildx --platform linux/arm64/v8,linux/amd64 ...

ubuntu:20.04 is already multi-arch:

podman manifest inspect docker.io/library/ubuntu:20.04
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "manifests": [
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 943,
            "digest": "sha256:c65d2b75a62135c95e2c595822af9b6f6cf0f32c11bcd4a38368d7b7c36b66f5",
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 943,
            "digest": "sha256:c451115c859850cde443827e764ae243ab630384ed5a93370b5086ba6616a152",
            "platform": {
                "architecture": "arm",
                "os": "linux",
                "variant": "v7"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 943,
            "digest": "sha256:f2141ef6a772e349f9ff08397c2a26da11074512b1f2fe6e77f7e9b2d6561a32",
            "platform": {
                "architecture": "arm64",
                "os": "linux",
                "variant": "v8"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 943,
            "digest": "sha256:601aeafd9c6f28f43c0fd3f1c670de07d157ed7bd5f968e878ceed73b5c321be",
            "platform": {
                "architecture": "ppc64le",
                "os": "linux"
            }
        },
        {
            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
            "size": 943,
            "digest": "sha256:89b7353d42e788609fc51e31863af42edf6f30e1b1d655b79e72ade8c18f7385",
            "platform": {
                "architecture": "s390x",
                "os": "linux"
            }
        }
    ]
}

So we shouldn't need to mess with build-args. If we're only publishing the images but not testing them the changes to the GitHub workflow should also be minimal.

manics commented 3 years ago

I've tried docker buildx build --tag test/z2jh-test --platform linux/amd64,linux/arm64 . and successfully built images for different architectures with the same tag following the The simple way with docker buildx section of https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

The big difference from docker build is the image is not added to the docker hosts image list, i.e. it doesn't appear in docker image ls. It doesn't make sense to have an arm64 image available on an amd64 host, and buildx has made the choice not to add any image to the local image store. Instead you have to push direct to a registry using --push ... or --output ..., or output the image to a tar file and load it with docker load. If we use a registry this complicates our CI pipeline since at present we assume K3S has access to locally built images without a local registry.

@consideRatio any thoughts on how we can best achieve this? I think adding the option of using buildx and building images for multiple architectures all under one tag is fairly easy to add to chartpress, but the problem is the follow on test workflow.

consideRatio commented 3 years ago

@manics wow I love how thorough research you do :heart: :tada:!

  1. I'm positive for letting chartpress manage building images with buildx for arm64!
  2. I understand we need a arm64 compatible base image for the jupyterhub/k8s-singleuser-sample image. Ideally we would find a way to build that image in a way that allows us to have that without having two separate images, but we can also have a separate, or simply accept we haven't tested these images in z2jh at all, or only up until we are to spawn a user.
  3. I'm positive to encapsulate a workaround for buildx not storing the images after build in chartpress if we need that.

Thank you for your thorough work to investigate this, it was a old wish I had when I got myself a RPi but ended up not pushing through to make it happen.

manics commented 3 years ago

I'm glad you're keen :smile: . Here's a 3 part plan that should avoid making too many large changes in one PR:

  1. Ensure the key images are compatible with ARM64
    • Add buildx support to chartpress
    • Update all Z2JH images except for singleuser-sample since in practice people would want their own image
    • Add a simple test job that builds the images on arm64 using chartpress buildx but doesn't publish them.
    • At this stage anyone should be able to run chartpress followed by helm install on ARM64 without my changes (this won't be tested in CI)
  2. Figure out how to publish the images with chartpress. This means helm install --version <published version> should work on ARM64 (this won't be tested in CI)
  3. Decide whether it's worth testing the whole workflow (install K3s and the chart in a ARM64 environment, build a singleuser-sample) or just leave it at 2 and assume that if the images build successfully they'll behave the same as on x86_64.
yuvipanda commented 3 years ago

So I was talking about the TLJH on ARM work on twitter, and have been offered AWS credits for CI/CD with ARM. As an ecosystem, I think the important bits are:

  1. jupyter-docker-stacks (which will also help with our singleuser image)
  2. TLJH (since it installs directly on the VM)
  3. z2jh

Getting CI for these things would be great. At least personally, I want it for TLJH so I'll work on that with GitHub self-hosted runners.

manics commented 3 years ago

AWS EC2 t4g instances are free to everyone until end of June (750 hours/month): https://aws.amazon.com/ec2/instance-types/t4/ @sakuraiyuta has shown an example of running arm64 in a GitHub workflow https://github.com/sakuraiyuta/zero-to-jupyterhub-k8s/commit/d22290b2ad6b0c69a7f682ce7ddc9b2b98a6a473

The reason I suggested the 3 stage approach for Z2JH is to avoid massively refactoring the CI at the same time other changes are made.

manics commented 3 years ago

@yuvipanda See https://github.com/jupyter/docker-stacks/issues/1019#issuecomment-797523060 for some progress on docker-stacks

sakuraiyuta commented 3 years ago

I say very thank you that maintainer start to consider supporting arm64/aarch64 architecture.

@yuvipanda See jupyter/docker-stacks#1019 (comment) for some progress on docker-stacks

Sorry the docker-stacks GH Action in this comment is now experimental so need some fixes. Where you need to fix? ...Ahh, dayjob killing me, I need time to check these codes.

You'd understand if you saw it, this action uses run-on-arch-action. This action runs other architecture docker container, and executes all build processes without fixing makefile as possible except to Dockerfile. Other points to be careful that docker pull in aarch64 container created by run-on-arch-action looks at docker-host's architecture(AMD64, not ARM64. this behavior confuses me). This is reason why need to specify architecture explicitly in Dockerfile . Additionally, the build process is VERY SLOW. Requires about 30min to complete build.

As @manics says, it's more better way is fix chartpress and use GH official docker buildx support.

consideRatio commented 3 years ago

We are getting closer but havn't reached the goal fully yet. The CI pipeline to publish failed when updating the singleuser-image just recently.

chartpress --push --publish-chart --builder docker-buildx --platform linux/amd64 --platform linux/arm64 --extra-message 'jupyterhub/zero-to-jupyterhub-k8s#2143 Merge pull request #2143 from jupyterhub/vuln-scan-singleuser-sample'
error: multiple platforms feature is currently not supported for docker driver. Please switch to a different driver (eg. "docker buildx create --use")

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/runs/2331388629?check_suite_focus=true

I think we are missing the following in the publish workflow.

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v1

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v1

I've opened #2144 to address this.

consideRatio commented 3 years ago

I went through all images we make use of in the Helm chart, and I checked those compatible with arm64

manics commented 3 years ago

Other than the singleuser-sample image which is dependent on https://github.com/jupyter/docker-stacks/issues/1019 this is now done and available in the latest dev version! You can test with @sakuraiyuta's notebook images:

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm upgrade --cleanup-on-fail --install jh jupyterhub/jupyterhub --version 0.11.1-n393.h2aa513d9 --set singleuser.image.name=sakuraiyuta/base-notebook,singleuser.image.tag=latest,proxy.service.type=NodePort
manics commented 3 years ago

Closing, follow-up issues created.