hashicorp / nomad-driver-podman

A nomad task driver plugin for sandboxing workloads in podman containers
https://developer.hashicorp.com/nomad/plugins/drivers/podman
Mozilla Public License 2.0
232 stars 62 forks source link
containers dockerless nomad nomad-podman-driver podman sandbox

Nomad podman Driver

Many thanks to @towe75 and Pascom for contributing this plugin to Nomad!

Features

Redis Example job

Here is a simple redis "hello world" Example:

job "redis" {
  datacenters = ["dc1"]
  type        = "service"

  group "redis" {
    network {
      port "redis" { to = 6379 }
    }

    task "redis" {
      driver = "podman"

        config {
          image = "docker://redis"
          ports = ["redis"]
        }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}
nomad run redis.nomad

==> Monitoring evaluation "9fc25b88"
    Evaluation triggered by job "redis"
    Allocation "60fdc69b" created: node "f6bccd6d", group "redis"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "9fc25b88" finished with status "complete"

podman ps

CONTAINER ID  IMAGE                           COMMAND               CREATED         STATUS             PORTS  NAMES
6d2d700cbce6  docker.io/library/redis:latest  docker-entrypoint...  16 seconds ago  Up 16 seconds ago         redis-60fdc69b-65cb-8ece-8554-df49321b3462

Building The Driver from source

This project has a go.mod definition. So you can clone it to whatever directory you want. It is not necessary to setup a go path at all. Ensure that you use go 1.17 or newer.

git clone git@github.com:hashicorp/nomad-driver-podman
cd nomad-driver-podman
make dev

The compiled binary will be located at ./build/nomad-driver-podman.

Runtime dependencies

You need a 3.0.x podman binary and a system socket activation unit, see https://www.redhat.com/sysadmin/podmans-new-rest-api

Nomad agent, nomad-driver-podman and podman will reside on the same host, so you do not have to worry about the ssh aspects of the podman api.

Ensure that Nomad can find the plugin, see plugin_dir

Driver Configuration

plugin "nomad-driver-podman" {
  config {
    volumes {
      enabled      = true
      selinuxlabel = "z"
    }
  }
}
plugin "nomad-driver-podman" {
  config {
    gc {
      container = false
    }
  }
}
plugin "nomad-driver-podman" {
  config {
    recover_stopped = true
  }
}
plugin "nomad-driver-podman" {
  config {
    socket_path = "unix:///run/podman/podman.sock"
  }
}
plugin "nomad-driver-podman" {
  config {
    socket {
      name = "default"
      socket_path = "unix://run/user/1000/podman/podman.sock"
    }
    socket {
      name = "app1"
      socket_path = "unix://run/user/1337/podman/podman.sock"
    }
  }
}
plugin "nomad-driver-podman" {
  config {
    disable_log_collection = false
  }
}
job_name
job_id
task_group_name
task_name
namespace
node_name
node_id
plugin "nomad-driver-podman" {
  config {
    extra_labels = ["job_name", "job_id", "task_group_name", "task_name", "namespace", "node_name", "node_id"]
  }
}
plugin "nomad-driver-podman" {
  config {
    client_http_timeout = "60s"
  }

Task Configuration

config {
  image = "docker://redis"
}
config {
  image = "your.registry.tld/some/image"
  auth {
    username   = "someuser"
    password   = "sup3rs3creT"
    tls_verify = true
  }
}
config {
  entrypoint = [
    "/bin/bash",
    "-c"
  ]
}
config {
  command = "some-command"
}
config {
  args = [
    "arg1",
    "arg2",
  ]
}
config {
  working_dir = "/data"
}
config {
  volumes = [
    "/some/host/data:/container/data:ro,noexec"
  ]
}
config {
  tmpfs = [
    "/var"
  ]
}
config {
  devices = [
    "/dev/net/tun"
  ]
}
config {
  init = true
}
config {
  init = true
  init_path = /usr/libexec/podman/catatonit
}
user = nobody

config {
}

driver = "nomad" (default) Podman redirects its combined stdout/stderr logstream directly to a Nomad fifo. Benefits of this mode are: zero overhead, don't have to worry about log rotation at system or Podman level. Downside: you cannot easily ship the logstream to a log aggregator plus stdout/stderr is multiplexed into a single stream..

config {
  logging = {
    driver = "nomad"
  }
}

driver = "journald" The container log is forwarded from Podman to the journald on your host. Next, it's pulled by the Podman API back from the journal into the Nomad fifo (controllable by disable_log_collection) Benefits: all containers can log into the host journal, you can ship a structured stream incl. metadata to your log aggregator. No log rotation at Podman level. You can add additional tags to the journal. Drawbacks: a bit more overhead, depends on Journal (will not work on WSL2). You should configure some rotation policy for your Journal. Ensure you're running Podman 3.1.0 or higher because of bugs in older versions.

config {
  logging = {
    driver = "journald"
    options = {
      "tag" = "redis"
    }
  }
}

After setting memory reservation, when the system detects memory contention or low memory, containers are forced to restrict their consumption to their reservation. So you should always set the value below --memory, otherwise the hard limit will take precedence. By default, memory reservation will be the same as memory limit.

config {
  memory_reservation = "100m"
}

Unit can be b (bytes), k (kilobytes), m (megabytes), or g (gigabytes). If you don't specify a unit, b is used. Set LIMIT to -1 to enable unlimited swap.

config {
  memory_swap = "180m"
}
config {
  memory_swappiness = 60
}

By default the task uses the network stack defined in the task group, see network Stanza. If the groups network behavior is also undefined, it will fallback to bridge in rootful mode or slirp4netns for rootless containers.

config {
  network_mode = "bridge"
}
config {
  socket = "app1"
}
config {
  cap_add = [
    "SYS_TIME"
  ]
}
config {
  cap_add = [
    "MKNOD"
  ]
}
config {
  selinux_opts = [
    "type:my_container.process"
  ]
}
config {
  sysctl = {
    "net.core.somaxconn" = "16384"
  }
}
config {
  labels = {
    "nomad" = "job"
  }
}
config {
  apparmor_profile = "your-profile"
}
config {
  force_pull = true
}
config {
  readonly_rootfs = true
}
config {
  ulimit {
    nproc = "4242"
    nofile = "2048:4096"
  }
config {
  userns = "keep-id:uid=200,gid=210"
}
config {
  pids_limit = 64
}
config {
  image_pull_timeout = "5m"
}

Network Configuration

nomad lifecycle hooks combined with the drivers network_mode allows very flexible network namespace definitions. This feature does not build upon the native podman pod structure but simply reuses the networking namespace of one container for other tasks in the same group.

A typical example is a network server and a metric exporter or log shipping sidecar. The metric exporter needs access to i.E. a private monitoring Port which should not be exposed the the network and thus is usually bound to localhost.

The repository includes three different examples jobs for such a setup. All of them will start a nats server and a prometheus-nats-exporter using different approaches.

You can use curl to proof that the job is working correctly and that you can get prometheus metrics:

curl http://your-machine:7777/metrics

2 Task setup, server defines the network

See examples/jobs/nats_simple_pod.nomad

Here, the server task is started as main workload and the exporter runs as a poststart sidecar. Because of that, Nomad guarantees that the server is started first and thus the exporter can easily join the servers network namespace via network_mode = "task:server".

Note, that the server configuration file binds the _httpport to localhost.

Be aware that ports must be defined in the parent network namespace, here server.

3 Task setup, a pause container defines the network

See examples/jobs/nats_pod.nomad

A slightly different setup is demonstrated in this job. It reassembles more closely the idea of a pod by starting a pause task, named pod via a prestart/sidecar hook.

Next, the main workload, server is started and joins the network namespace by using the network_mode = "task:pod" stanza. Finally, Nomad starts the poststart/sidecar exporter which also joins the network.

Note that all ports must be defined on the pod level.

2 Task setup, shared Nomad network namespace

See examples/jobs/nats_group.nomad

This example is very different. Both server and exporter join a network namespace which is created and managed by Nomad itself. See nomad network stanza to get started with this generic approach.

Rootless on ubuntu

edit /etc/default/grub to enable cgroups v2

GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1 systemd.unified_cgroup_hierarchy=1"

sudo update-grub

ensure that podman socket is running

$ systemctl --user status podman.socket
* podman.socket - Podman API Socket
     Loaded: loaded (/usr/lib/systemd/user/podman.socket; disabled; vendor preset: disabled)
     Active: active (listening) since Sat 2020-10-31 19:21:29 CET; 22h ago
   Triggers: * podman.service
       Docs: man:podman-system-service(1)
     Listen: /run/user/1000/podman/podman.sock (Stream)
     CGroup: /user.slice/user-1000.slice/user@1000.service/podman.socket

ensure that you have a recent version of crun

$ crun -V
crun version 0.13.227-d38b
commit: d38b8c28fc50a14978a27fa6afc69a55bfdd2c11
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL

nomad job run example.nomad

job "example" {
  datacenters = ["dc1"]
  type        = "service"

  group "cache" {
    count = 1
    restart {
      attempts = 2
      interval = "30m"
      delay    = "15s"
      mode     = "fail"
    }
    network {
      port "redis" { to = 6379 }
    }
    task "redis" {
      driver = "podman"

      config {
        image = "redis"
        ports = ["redis"]
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB
      }
    }
  }
}

verify podman ps

$ podman ps
CONTAINER ID  IMAGE                           COMMAND       CREATED        STATUS            PORTS                                                 NAMES
2423ae3efa21  docker.io/library/redis:latest  redis-server  7 seconds ago  Up 6 seconds ago  127.0.0.1:21510->6379/tcp, 127.0.0.1:21510->6379/udp  redis-b640480f-4b93-65fd-7bba-c15722886395

Local Development

Requirements

Vagrant Environment Setup

# create the vm
vagrant up

# ssh into the vm
vagrant ssh

Running a Nomad dev agent with the Podman plugin:

# Build the task driver plugin
make dev

# Copy the build nomad-driver-plugin executable to examples/plugins/
cp ./build/nomad-driver-podman examples/plugins/

# Start Nomad
nomad agent -config=examples/nomad/server.hcl 2>&1 > server.log &

# Run the client as sudo
sudo nomad agent -config=examples/nomad/client.hcl 2>&1 > client.log &

# Run a job
nomad job run examples/jobs/redis_ports.nomad

# Verify
nomad job status redis

sudo podman ps

Running the tests:

# Start the Podman server
systemctl --user start podman.socket

# Run the tests
CI=1 ./build/bin/gotestsum --junitfile ./build/test/result.xml -- -timeout=15m . ./api