nestybox / sysbox

An open-source, next-generation "runc" that empowers rootless containers to run workloads such as Systemd, Docker, Kubernetes, just like VMs.
Apache License 2.0
2.78k stars 152 forks source link

Gitlab runner Kubernetes executor #516

Closed gdespreslaberge closed 2 years ago

gdespreslaberge commented 2 years ago

Hi,

We're trying to get sysbox running on our gitlab runners. We've pushed gitlab to include runtimeClassName in the runners and now that it's integrated, we're trying to make it run but running into some issues.

The runners toml config:

    [[runners]]
      name = "Linux Sysbox (https://square/runners)"
      [runners.feature_flags]
        FF_KUBERNETES_HONOR_ENTRYPOINT = true
      [runners.kubernetes]
        helper_image = "gitlab/gitlab-runner:alpine3.14-bleeding"

The CI Job:

default:
  image: 
    name: registry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
  retry: 2
  tags:
  - square-linux-k8s-sysbox

stages:
  - test

test-job:
  stage: test  
  script:
    - ps aux
    - systemctl status docker
    - docker ps
    - docker build .

So the initial issue was that we are not getting any shell since the systemd is a blocking process. We've tried using a custom script via a service file as suggested here but that did not work either as for some reason docker and the service won't start at init (but can be started manually.

I've exhausted all my potential solutions. We would really like to use sysbnox in our prod environment, could you give us a hand?

Thanks

PS: I can get things running in a simple pod with kubectl apply -f and everything works as intended.

apiVersion: v1
kind: Pod
metadata:
  name: docker-test
  annotations:
    io.kubernetes.cri-o.userns-mode: "auto:size=65536"
spec:
  runtimeClassName: sysbox-runc
  containers:
  - name: doocker-test
    image: registry.nestybox.com/nestybox/ubuntu-bionic-systemd-docker
ctalledo commented 2 years ago

Hi @gdespreslaberge, thanks for trying Sysbox.

Basic question: why use the nestybox/ubuntu-bionic-systemd-docker for the CI pipeline? Why not use an image without systemd? I ask because normally the nestybox/ubuntu-bionic-systemd-docker is used for running development environments in containers, but for CI jobs people often use leaner images that only carry the components required for the CI job (e.g., alpine + Docker).

gdespreslaberge commented 2 years ago

Hi @ctalledo, we are using it because we are trying to provide our users with a rootless docker in docker environment. Other Docker images requires the use of privileged containers and/or mounting the docker socket.

ctalledo commented 2 years ago

we are using it because we are trying to provide our users with a rootless docker in docker environment.

Oh but you don't need that particular image.

For example, you could use this image that has Alpine + Docker (i.e., no systemd); the only caveat is you have to start dockerd inside the container manually (e.g., dockerd > /var/log/dockerd.log 2>&1 &).

Or you could even use Docker's official Docker-in-Docker (dind) image, as shown in this blog post (see section "GitLab Runner Deploys Jobs in System Containers"). This is probably the most natural way for deploying Docker-in-Docker with GitLab + Sysbox.

In general, when using Sysbox as the container runtime, you can choose any image and it will run "rootless" (i.e., root user in the container = unprivileged user on the host). You can install Docker or whatever other software you wish inside that container and things should work fine.

gdespreslaberge commented 2 years ago

I did read the blogpost before starting this project (great read writeup btw), but it seems aimed at Docker executors. Here's what I've tried:

      [runners.kubernetes]
        privileged = false
        runtime_class_name = "sysbox-runc"
        [[runners.kubernetes.volumes.empty_dir]]
        name = "dind-storage"
        mount_path = "/var/lib/docker"
        [[runners.kubernetes.volumes.host_path]]
        name = "hostpath-modules"
        mount_path = "/lib/modules"
        read_only = true
        host_path = "/lib/modules"
        [[runners.kubernetes.volumes.host_path]]
        name = "hostpath-cgroup"
        mount_path = "/sys/fs/cgroup"
        host_path = "/sys/fs/cgroup"
        [[runners.kubernetes.volumes.host_path]]
        name = "docker-sock"
        mount_path = "/var/run/docker.sock"
        host_path = "/var/run/docker.sock"
        [runners.kubernetes.pod_annotations]
        "io.kubernetes.cri-o.userns-mode"="auto:size=65536"
        [runners.kubernetes.node_selector]
        sysbox-install="yes"

And modified my ci to use docker:dind Here's the error from the pod:

Signature ok
subject=CN = docker:dind client
Getting CA Private Key
/certs/client/cert.pem: OK
mount: permission denied (are you root?)
Could not mount /sys/kernel/security.
AppArmor detection and --privileged mode might break.
time="2022-03-23T13:40:47.376868802Z" level=info msg="Starting up"
time="2022-03-23T13:40:47.378567291Z" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
failed to load listeners: can't create unix socket /var/run/docker.sock: device or resource busy

Seems it is unable to mount properly since it's not privileged.

ctalledo commented 2 years ago

Hi @gdespreslaberge, some comments:

       [[runners.kubernetes.volumes.empty_dir]]
       name = "dind-storage"
       mount_path = "/var/lib/docker"
       [[runners.kubernetes.volumes.host_path]]
       name = "hostpath-modules"
       mount_path = "/lib/modules"
       read_only = true
       host_path = "/lib/modules"
       [[runners.kubernetes.volumes.host_path]]
       name = "hostpath-cgroup"
       mount_path = "/sys/fs/cgroup"
       host_path = "/sys/fs/cgroup"
       [[runners.kubernetes.volumes.host_path]]
       name = "docker-sock"
       mount_path = "/var/run/docker.sock"
       host_path = "/var/run/docker.sock"

Based on this, try with the following simplified config, and use the docker:dind image.

 [runners.kubernetes]
       privileged = false
       runtime_class_name = "sysbox-runc"
       [runners.kubernetes.pod_annotations]
       "io.kubernetes.cri-o.userns-mode"="auto:size=65536"
       [runners.kubernetes.node_selector]
       sysbox-install="yes"

You should then be able to tell GitLab to exec docker commands inside that pod.

I don't have a setup with K8s + GitLab, so I am not able to repro your scenario, but let me know if you hit any problems and I'll be happy to help.

gdespreslaberge commented 2 years ago

Sorry I couldn't try this friday. Gave it a shot this morning. Here's the new toml for the runners:

      [runners.kubernetes]
        helper_image = "gitlab/gitlab-runner-helper:alpine3.14-x86_64-bleeding"
        privileged = false
        cpu_request = "0.5"
        cpu_limit = "2"
        memory_request= "3G"
        memory_limit= "5G"
        runtime_class_name = "sysbox-runc"
        [runners.kubernetes.pod_annotations]
        "io.kubernetes.cri-o.userns-mode"="auto:size=65536"
        [runners.kubernetes.node_selector]
        sysbox-install="yes"

Then the simple CI:

default:
  image: 
    name: docker:dind
  retry: 2

stages:
  - test

test-job:
  stage: test  
  script:
    - docker ps
    - docker build .

And getting the same error:

Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: square-linux-runners-sysbox
Using Kubernetes executor with image docker:dind ...
Using attach strategy to execute scripts...
Preparing environment
00:03
Waiting for pod linux-runners-sysbox/runner-fdjz3nfm-project-125593-concurrent-0vdhq4 to be running, status is Pending
    ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
    ContainersNotReady: "containers with unready status: [build helper]"
    ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-fdjz3nfm-project-125593-concurrent-0vdhq4 via gitlab-runner-gitlab-runner-8447fb[6](https://gitlab)cb5-bmmp6...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/gdespreslaberge/sysbox-runners/.git/
Created fresh repository.
Checking out 9a6[7](https://gitlab)9d46 as main...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:01
$ docker ps
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: command terminated with exit code 1

Here's an anonymized pastebin of the generated pod's yaml if that can help debugging: https://pastebin.com/VhmQhVcs

Thank you so much for your help!

ctalledo commented 2 years ago

Hi @gdespreslaberge, apologies for the late reply, been a bit busy.

Looking at the pod's yaml, it seems docker is not starting inside the docker:dind because GitLab is overriding the container's entrypoint with it's own command:

  - command:
    - sh
    - -c
    - "if [ -x /usr/local/bin/bash ]; then\n\texec /usr/local/bin/bash \nelif [ -x
      /usr/bin/bash ]; then\n\texec /usr/bin/bash \nelif [ -x /bin/bash ]; then\n\texec
      /bin/bash \nelif [ -x /usr/local/bin/sh ]; then\n\texec /usr/local/bin/sh \nelif
      [ -x /usr/bin/sh ]; then\n\texec /usr/bin/sh \nelif [ -x /bin/sh ]; then\n\texec
      /bin/sh \nelif [ -x /busybox/sh ]; then\n\texec /busybox/sh \nelse\n\techo shell
      not found\n\texit 1\nfi\n\n"

Try the following please CI yaml instead:

default:
  image: 
    name: ghcr.io/nestybox/alpine-docker
  retry: 2

stages:
  - test

test-job:
  stage: test  
  script:
    - dockerd > /var/log/dockerd.log 2>&1 &
    - sleep 5
    - docker ps

Let me know how that goes; if it does not work, please paste the generated pods' yaml again. Thanks!

gdespreslaberge commented 2 years ago

That indeed worked! Thank you very much for your help.

ctalledo commented 2 years ago

That indeed worked! Thank you very much for your help.

Great @gdespreslaberge!

Are we good to close this issue or do you need any further assistance?