Support for rootless docker?

9maf4you commented 1 year ago

Nomad version

Nomad v1.6.1 BuildDate 2023-07-21T13:49:42Z Revision 515895c7690cdc72278018dc5dc58aca41204ccc

Operating system and Environment details

Rocky Linux release 8.7 (Green Obsidian)

cni_path = "/opt/cni/bin"
cni_config_dir = "/opt/cni/config"

ls /opt/cni/config
mynet.conflist

mynet.conflist what taken from the doc just the name was changed.

Issue

I'm trying to launch our rootless containers ( docker ) with consul-connect According the docs the bridge configuration is a prerequisite for Consul Connect. It is raises a question of how to configure nomad/cni appropriate way. rootless docker uses slirp4netns to set up network which means slirp4netns CNI plugin should be exist for it.

Anyway I've tried some crazy configurations just in hopes something wasn't documented. So it seems to me as not supported. But probably I missed something.

Reproduction steps: network.mode = cni/mynet

./nomad init -short

set network.mode = "cni/mynet"

Expected Result

nomad spin-up a container

Actual Result

none containers are running

Job file (if appropriate)

job "example" {
  group "cache" {
    network {
      mode = "cni/mynet"
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      identity {
        env  = true
        file = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

2023-08-23T16:30:45.245Z [WARN]  client.alloc_runner.runner_hook: failed to configure network: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c error="plugin type=\"loopback\" failed (add): unknown FS magic on \"/run/user/1007/docker/netns/eeb2f2825056\": 1021994" attempt=3
2023-08-23T16:30:45.245Z [ERROR] client.alloc_runner: prerun failed: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c error="pre-run hook \"network\" failed: failed to configure networking for alloc: failed to configure network: plugin type=\"loopback\" failed (add): unknown FS magic on \"/run/user/1007/docker/netns/eeb2f2825056\": 1021994"
2023-08-23T16:30:45.245Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c task=redis type="Setup Failure" msg="failed to setup alloc: pre-run hook \"network\" failed: failed to configure networking for alloc: failed to configure network: plugin type=\"loopback\" failed (add): unknown FS magic on \"/run/user/1007/docker/netns/eeb2f2825056\": 1021994" failed=true
2023-08-23T16:30:45.246Z [WARN]  client.alloc_runner.runner_hook: failed to configure network: alloc_id=c934d4a0-a211-ff28-76cc-f444c71e4949 error="plugin type=\"loopback\" failed (add): unknown FS magic on \"/run/user/1007/docker/netns/0ff8589f461a\": 1021994" attempt=1
2023-08-23T16:30:45.249Z [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c task=redis
2023-08-23T16:30:45.249Z [INFO]  client.gc: marking allocation for GC: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c
2023-08-23T16:30:45.251Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=1c80c047-4cd4-f4a9-1ee6-9f1be51e592c task=redis type="Alloc Unhealthy" msg="Unhealthy because of failed task" failed=false
2023-08-23T16:30:45.340Z [DEBUG] worker: dequeued evaluation: worker_id=4eef4ffd-9bae-715f-5feb-20eddca88a26 eval_id=a73d35c6-735e-3ee6-d048-376907f54b74 type=service namespace=default job_id=example-default node_id="" triggered_by=alloc-failure
2023-08-23T16:30:45.341Z [DEBUG] client: updated allocations: index=753 total=24 pulled=13 filtered=11
2023-08-23T16:30:45.341Z [DEBUG] client: allocation updates: added=0 removed=0 updated=13 ignored=11
2023-08-23T16:30:45.361Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=13 ignored=11 errors=0
2023-08-23T16:30:45.402Z [DEBUG] http: request complete: method=GET path="/v1/deployment/387c9bab-5e3b-e757-b144-4c2583957a1a?index=746&stale=" duration=2.526063474s
2023-08-23T16:30:45.681Z [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=a73d35c6-735e-3ee6-d048-376907f54b74 job_id=example-default namespace=default worker_id=4eef4ffd-9bae-715f-5feb-20eddca88a26
  results=
  | Total changes: (place 0) (destructive 0) (inplace 0) (stop 0) (disconnect 0) (reconnect 0)
  | Desired Changes for "cache": (place 0) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 3) (canary 0)

Nomad Client logs (if appropriate)

-

Reproduction steps: network_mode = "slirp4netns" for docker's driver

./nomad init -short

set network_mode = "slirp4netns"

Expected Result

nomad spin-up a container

Actual Result

none containers are running

Job file (if appropriate)

job "example-default-2" {
  group "cache" {
    network {
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        network_mode = "slirp4netns"
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      identity {
        env  = true
        file = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

2023-08-23T17:00:08.207Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2ae6a55a-3eaf-35e3-0c33-bb911485b99a task=redis type="Driver Failure" msg="Failed to start container 8ed93827838e0611ee25ecadad97bea2401ebcf76f51b7a24dd0e0f5697b947b: API error (404): network slirp4netns not found" failed=false
2023-08-23T17:00:08.210Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=2ae6a55a-3eaf-35e3-0c33-bb911485b99a task=redis error="Failed to start container 8ed93827838e0611ee25ecadad97bea2401ebcf76f51b7a24dd0e0f5697b947b: API error (404): network slirp4netns not found"

Nomad Client logs (if appropriate)

-

Reproduction steps:

./nomad init -short

set network.mode = slirp4netns

Expected Result

nomad spin-up a container

Actual Result

none containers are running

Job file (if appropriate)

job "example-default-3" {

  group "cache" {
    count = 3
    network {
      mode = "slirp4netns"
      port "db" {
        to = 6379
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image          = "redis:7"
        ports          = ["db"]
        auth_soft_fail = true
      }

      identity {
        env  = true
        file = true
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Nomad Server logs (if appropriate)

-

Nomad Client logs (if appropriate)

==> 2023-08-23T17:04:48Z: Evaluation "40f77a5c" finished with status "complete" but failed to place all allocations:
    2023-08-23T17:04:48Z: Task Group "cache" (failed to place 3 allocations):
      * Class "RM": 3 nodes excluded by filter
      * Constraint "computed class ineligible": 3 nodes excluded by filter

lgfa29 commented 1 year ago

Hi @9maf4you 👋

We don't officially support a rootless Docker daemon. Docker has some quirks around network management that we need to workaround in order to provide things like bridge network mode and other things, and I'm not even sure where to start 🤔

Setting the network mode to slirp4netns will not work because, as far as I can tell, this not actually a network mode, but more like network driver.

For the CNI driver error, I don't remember seeing that unknown FS magic message before, and looking online there seems to be a multitude of reasons why it could happen.

Is the Nomad agent running as root?

I will adjust the issue title to a feature request, as I don't expect rootless Docker to work.

shoenig commented 1 year ago

@9maf4you if it's an option for you, the podman task driver supports rootless mode.

The way Docker and Podman handle things (especially networking) is very different, in a way that working with Podman is much more flexible / easier for us.

9maf4you commented 1 year ago

Hey! @shoenig @lgfa29 Thank you for the quick response. Podman is an option for me, so I'll take a shot with it.

@lgfa29

Is the Nomad agent running as root? yes it is.

9maf4you commented 1 year ago

Hello, @shoenig I've tried podman as you suggested. Are you sure the bundle rootless podman + consul-connect works? it doesn't work for me.

My set-up is made by this docs

grep ^runtime /usr/share/containers/containers.conf
runtime = "crun

rpm -q slirp4netns
slirp4netns-1.2.0-2.module+el8.8.0+1265+fa25dd7a.x86_64

rpm -q fuse-overlayfs
fuse-overlayfs-1.11-1.module+el8.8.0+1265+fa25dd7a.x86_64

mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

ps uax | grep podman
rocky       1766  0.0  0.0  89920 10316 ?        S    12:27   0:00 /usr/bin/podman

Nomad client works under the root and podman works from unprivileged user and it fails during allocation with that error:

Time                  Type             Description
2023-08-28T13:14:26Z  Alloc Unhealthy  Unhealthy because of failed task
2023-08-28T13:14:22Z  Killing          Sent interrupt. Waiting 5s before force killing
2023-08-28T13:14:22Z  Not Restarting   Error was unrecoverable
2023-08-28T13:14:22Z  Driver Failure   rpc error: code = Unknown desc = failed to start task, could not start container: cannot start container, status code: 500: {"cause":"OCI permission denied","message":"crun: cannot setns `/var/run/netns/1abe06e9-6953-59eb-1387-e4b95943a19e`: Operation not permitted: OCI permission denied","response":500}

job "c9" {

  group "api" {
    network {
     mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "podman"

      config {
        image          = "hashicorpdev/counter-api:v3"
        auth_soft_fail = true
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"
      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "podman"
      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image          = "hashicorpdev/counter-dashboard:v3"
        auth_soft_fail = true
      }
    }
  }
}

And the reason for this I believe is https://github.com/hashicorp/nomad/issues/13669

shoenig commented 1 year ago

Ahh sorry @9maf4you, for some reason I thought we had tested the rootless connect scenario, but no. Thinking about it now it makes sense to see the error you are getting. Podman is trying to join the network namespace created by the Nomad client for each task in the group to join, but in doing so requires the CAP_SYS_ADMIN privilege. https://www.man7.org/linux/man-pages/man2/setns.2.html (or running as root).

9maf4you commented 1 year ago

Hey @shoenig . Should I open a new issue or is it already exist?

9maf4you commented 1 year ago

Hey @shoenig sorry for bothering you once again. Could you please reply to my previous question. And if you are going to fix the issue with caps could you please share plans about it. Thanks!

shoenig commented 1 year ago

Hi @9maf4you, I'm not sure what the fix would be; joining a network namespace in Linux is an operation that requires root and there isn't much Nomad can do about that. Ostensibly the solution is to have a parent process launched as root join the namespace and then fork/exec into the desired task - in fact this is was Nomad does for the exec/raw_exec task drivers. However Nomad is not the parent of a docker container - docker is, so you'd have to talk to docker about implementing that feature.

9maf4you commented 1 year ago

Hey @shoenig. Sorry, probably, my message wasn't clear enough. In my last message I was talking about podman since you suggest to try it. The issue Your reply Thanks again!

tgross commented 11 months ago

I'm going to close this issue, because as noted above running as non-root is unsupported and building support isn't on the near-term roadmap. I'm going to link to this issue from https://github.com/hashicorp/nomad/issues/13669 for discussions on how we might work on this in the future. If you have more comments on this after having read through that issue in detail, I'd suggest you make comments over in #13669.

hashicorp / nomad