hashicorp / nomad-driver-podman

A nomad task driver plugin for sandboxing workloads in podman containers
https://developer.hashicorp.com/nomad/plugins/drivers/podman
Mozilla Public License 2.0
227 stars 62 forks source link

nginx image fails to start with podman driver #22

Closed kriestof closed 2 years ago

kriestof commented 4 years ago

When I use podman driver I get below error.

rpc error: code = Unknown desc = failed to start task, could not start container: io.podman.ErrorOccurred(Reason: container_linux.go:349: starting container process caused "exec: \"nginx\": executable file not found in $PATH": OCI runtime command not found error)

The same works with docker driver. At the same time using directly podman to start nginx image works. I'm also able to run nomad's redis example with podman driver.

I'm using Arch Linux with pacman's podman package. I used below to install podman.

systemctl start podman
systemctl start io.podman

Example nomad job:

job "nginx" {
  datacenters = ["dc1"]

  group "nginx" {
    reschedule {
      attempts  = 0
      unlimited = false
    }

    task "web" {
      driver = "podman"

      config {
        image = "nginx"

        port_map {
          http = 80
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB

        network {
          mbits = 10
          port  "http"  {}
        }
      }

      service {
        name = "nginx"
        port = "http"

        check {
          name     = "alive"
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
onlyjob commented 4 years ago

Check your libpod.conf(5) (/etc/containers/libpod.conf) for "Default OCI runtime" which is either runc or crun. Config file should have a correct path to OCI executable: https://salsa.debian.org/debian/libpod/-/blob/master/debian/etc/containers/libpod.conf#L120-144

kriestof commented 4 years ago

Unfortunately does not seem to help. Copied default config from /usr/share/containers/libpod.conf which looks like below. It has set runc runtime and proper path to its binary. Moreover podman info returns correct OCI runtime. As I understand the problem is for some reason podman varlink cannot guess OCI runtime. It is even more wired because redis image can be used with similard nomad task.

[root@node1 ~]# podman info
host:
  arch: amd64
  buildahVersion: 1.14.8
  cgroupVersion: v1
  conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.16, commit: a97780984207f29652af90fa3dc4e8e7576548e7'
  cpus: 4
  distribution:
    distribution: arch
    version: unknown
  eventLogger: file
  hostname: node1
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.6.13-arch1-1
  memFree: 2341027840
  memTotal: 8297144320
  ociRuntime:
    name: runc
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  os: linux
  rootless: false
  [...]
[root@node1 ~]# cat /etc/containers/libpod.conf 
# libpod.conf is the default configuration file for all tools using libpod to
# manage containers

# Default transport method for pulling and pushing for images
image_default_transport = "docker://"

# Paths to look for the conmon container manager binary.
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
conmon_path = [
        "/usr/libexec/podman/conmon",
        "/usr/local/libexec/podman/conmon",
        "/usr/local/lib/podman/conmon",
        "/usr/bin/conmon",
        "/usr/sbin/conmon",
        "/usr/local/bin/conmon",
        "/usr/local/sbin/conmon",
        "/run/current-system/sw/bin/conmon",
]

# Environment variables to pass into conmon
conmon_env_vars = [
        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]

# CGroup Manager - valid values are "systemd" and "cgroupfs"
cgroup_manager = "systemd"

# Container init binary
#init_path = "/usr/libexec/podman/catatonit"

# Directory for persistent libpod files (database, etc)
# By default, this will be configured relative to where containers/storage
# stores containers
# Uncomment to change location from this default
#static_dir = "/var/lib/containers/storage/libpod"

# Directory for temporary files. Must be tmpfs (wiped after reboot)
tmp_dir = "/var/run/libpod"

# Maximum size of log files (in bytes)
# -1 is unlimited
max_log_size = -1

# Whether to use chroot instead of pivot_root in the runtime
no_pivot_root = false

# Directory containing CNI plugin configuration files
cni_config_dir = "/etc/cni/net.d/"

# Directories where the CNI plugin binaries may be located
cni_plugin_dir = [
           "/usr/libexec/cni",
           "/usr/lib/cni",
           "/usr/local/lib/cni",
           "/opt/cni/bin"
]

# Default CNI network for libpod.
# If multiple CNI network configs are present, libpod will use the network with
# the name given here for containers unless explicitly overridden.
# The default here is set to the name we set in the
# 87-podman-bridge.conflist included in the repository.
# Not setting this, or setting it to the empty string, will use normal CNI
# precedence rules for selecting between multiple networks.
cni_default_network = "podman"

# Default libpod namespace
# If libpod is joined to a namespace, it will see only containers and pods
# that were created in the same namespace, and will create new containers and
# pods in that namespace.
# The default namespace is "", which corresponds to no namespace. When no
# namespace is set, all containers and pods are visible.
#namespace = ""

# Default infra (pause) image name for pod infra containers
infra_image = "k8s.gcr.io/pause:3.2"

# Default command to run the infra container
infra_command = "/pause"

# Determines whether libpod will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# they are held open by conmon as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#enable_port_reservation = true

# Default libpod support for container labeling
# label=true

# The locking mechanism to use
lock_type = "shm"

# Number of locks available for containers and pods.
# If this is changed, a lock renumber must be performed (e.g. with the
# 'podman system renumber' command).
num_locks = 2048

# Directory for libpod named volumes.
# By default, this will be configured relative to where containers/storage
# stores containers.
# Uncomment to change location from this default.
#volume_path = "/var/lib/containers/storage/volumes"

# Selects which logging mechanism to use for Podman events.  Valid values
# are `journald` or `file`.
# events_logger = "journald"

# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"

# Default OCI runtime
runtime = "runc"

# List of the OCI runtimes that support --format=json.  When json is supported
# libpod will use it for reporting nicer errors.
runtime_supports_json = ["crun", "runc"]

# List of all the OCI runtimes that support --cgroup-manager=disable to disable
# creation of CGroups for containers.
runtime_supports_nocgroups = ["crun"]

# Paths to look for a valid OCI runtime (runc, runv, etc)
# If the paths are empty or no valid path was found, then the `$PATH`
# environment variable will be used as the fallback.
[runtimes]
runc = [
        "/usr/bin/runc",
        "/usr/sbin/runc",
        "/usr/local/bin/runc",
        "/usr/local/sbin/runc",
        "/sbin/runc",
        "/bin/runc",
        "/usr/lib/cri-o-runc/sbin/runc",
        "/run/current-system/sw/bin/runc",
]

crun = [
        "/usr/bin/crun",
        "/usr/sbin/crun",
        "/usr/local/bin/crun",
        "/usr/local/sbin/crun",
        "/sbin/crun",
        "/bin/crun",
        "/run/current-system/sw/bin/crun",
]

# Kata Containers is an OCI runtime, where containers are run inside lightweight
# Virtual Machines (VMs). Kata provides additional isolation towards the host,
# minimizing the host attack surface and mitigating the consequences of
# containers breakout.
# Please notes that Kata does not support rootless podman yet, but we can leave
# the paths below blank to let them be discovered by the $PATH environment
# variable.

# Kata Containers with the default configured VMM
kata-runtime = [
    "/usr/bin/kata-runtime",
]

# Kata Containers with the QEMU VMM
kata-qemu = [
    "/usr/bin/kata-qemu",
]

# Kata Containers with the Firecracker VMM
kata-fc = [
    "/usr/bin/kata-fc",
]

# The [runtimes] table MUST be the last thing in this file.
# (Unless another table is added)
# TOML does not provide a way to end a table other than a further table being
# defined, so every key hereafter will be part of [runtimes] and not the main
# config.
kriestof commented 4 years ago

Moved to docker driver for the time being. If it seems to be only config problem please close.

towe75 commented 4 years ago

I can not reproduce it on a fedora host, it works out of the box. Presumably it's some configuration issue.

@kriestof did you tweak any settings before you tried it or is it the result of the stock configuration? Can you test again by running nomad directly in your shell instead of a (systemd) service?

kriestof commented 4 years ago

@towe75, sorry for late response.

Nope, no settings tweaking. No help with nomad's systemd service exclusion.

One crucial difference I could find between fedora and arch is repository. Arch always tries to use the newest packages while fedora is probably more conservative. My podman version is 1.9.2-1.

Another thing is podman's varlink enabling instructions are quite complicated. And I limited my work there to starting io.podman.service (hoping this should be enough).

towe75 commented 4 years ago

Hey @kriestof , thank you for coming back on this.

Yes, 1.9.2 is indeed a very recent version, i'm still using 1.8.x here. I will try to reproduce it on a arch setup.

weeezes commented 4 years ago

Hey @kriestof , I think I hit this same issue when working on #24 , I'm also running a fresh Podman.

I think the $PATH inside the image isn't set correctly in this case. How I fixed it it in the PR is by setting the environment for the container explicitly by finding the environment configuration from within the image.

I got your example running (had to modify the image path)

job "nginx" {
  datacenters = ["dc1"]

  group "nginx" {
    reschedule {
      attempts  = 0
      unlimited = false
    }

    task "web" {
      driver = "podman"

      config {
        image = "docker://docker.io/library/nginx:latest"

        port_map {
          http = 80
        }
      }

      resources {
        cpu    = 500 # 500 MHz
        memory = 256 # 256MB

        network {
          mbits = 10
          port  "http"  {}
        }
      }

      service {
        name = "nginx"
        port = "http"

        check {
          name     = "alive"
          type     = "http"
          path     = "/"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}

And the allocation status:

$ nomad alloc status 6f232741
ID                  = 6f232741-9352-5e2c-c9dd-8d11a45b44c2
Eval ID             = cb7b4d46
Name                = nginx.nginx[0]
Node ID             = 38885a87
Node Name           = ...
Job ID              = nginx
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m30s ago
Modified            = 8s ago
Deployment ID       = d15432b9
Deployment Health   = unset

Task "web" is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/500 MHz  11 MiB/256 MiB  300 MiB  http: 127.0.0.1:26968

Task Events:
Started At     = 2020-06-06T09:13:52Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2020-06-06T12:17:55+03:00  Killing     Sent interrupt. Waiting 5s before force killing
2020-06-06T12:13:52+03:00  Started     Task started by client
2020-06-06T12:13:39+03:00  Driver      Image downloaded: Storing signatures
2020-06-06T12:13:33+03:00  Driver      Downloading image
2020-06-06T12:13:33+03:00  Task Setup  Building Task Directory
2020-06-06T12:13:33+03:00  Received    Task received by client

Container running and responding properly:

$ curl 127.0.0.1:26968

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Potentially this commit fixes the problem here: https://github.com/pascomnet/nomad-driver-podman/pull/24/commits/195364c16407b97d8f639af59794838992d60cd1

I vaguely remember being confused about the error too for a long while, thinking that there's something wrong with the OCI runtime.

henryx commented 4 years ago

I have the same problem using container with private repository. Error is:

2020-08-20T02:28:53+02:00  Driver Failure   rpc error: code = Unknown desc = failed to start task, could not start container: io.podman.ErrorOccurred(Reason: container_linux.go:349: starting container process caused "exec: \"sh\": executable file not found in $PATH": OCI runtime command not found error)

My job is:

task "mycontainer" {
            driver = "podman"

            config {
                image = "docker://my-repository.jfrog.io/myimage:latest"
                port_map {
                    http_fvcservice = 8085
                }
            }
}

Distribution is CentOS 8 with varlink properly configured. I use the 0.1.0 version of the plugin

kriestof commented 2 years ago

I think this is no longer relevant. Right now in Arch you can use nomad-driver-podman which works well.

If I'm wrong, please reopen.