hashicorp / nomad-driver-lxc

HashiCorp Nomad LXC driver plugin
Mozilla Public License 2.0
31 stars 19 forks source link

LXC driver failure: error setting network type configuration: setting config item for the container failed #16

Open yum-install-brains opened 4 years ago

yum-install-brains commented 4 years ago

Nomad version

Nomad v0.9.6 (1f8eddf2211d064b150f141c86e30d9fceabec89)

Operating system and Environment details

OS name and verison:

Distributor ID: Debian
Description:    Debian GNU/Linux 9.9 (stretch)
Release:    9.9
Codename:   stretch

Kerner version:

4.19.0-0.bpo.2-amd64

LXC version

ii  liblxc1                          1:3.1.0+really3.0.4-2          amd64        Linux Containers userspace tools (library)
ii  libpam-cgfs                      1:3.1.0+really3.0.4-2          amd64        PAM module for managing cgroups for LXC
ii  lxc                              1:3.1.0+really3.0.4-2          amd64        Linux Containers userspace tools
ii  lxcfs                            3.0.4-2                        amd64        FUSE based filesystem for LXC

Issue

lxc-driver throws an error: Driver Failure rpc error: code = Unknown desc = error setting network type configuration: setting config item for the container failed

Reproduction steps

  1. Install nomad

    curl -O https://releases.hashicorp.com/nomad/0.9.6/nomad_0.9.6_linux_amd64.zip
    unzip nomad_0.9.6_linux_amd64.zip
    sudo mv nomad /usr/local/bin
    rm nomad_0.9.6_linux_amd64.zip
  2. Install lxc driver

    sudo apt install -y lxc
    sudo mkdir -p /opt/nomad/data/plugins
    curl -O https://releases.hashicorp.com/nomad-driver-lxc/0.1.0-rc2/nomad-driver-lxc_0.1.0-rc2_linux_amd64.zip
    unzip nomad-driver-lxc_0.1.0-rc2_linux_amd64.zip
    sudo mv nomad-driver-lxc /opt/nomad/data/plugins
    rm ./nomad-driver-lxc*.zip
  3. Define server.hcl and run sudo nomad agent -config server.hcl

    
    # Increase log verbosity
    log_level = "DEBUG"

datacenter = "DC1"

Setup data dir

data_dir = "/tmp/server1"

Enable the server

server { enabled = true

# Self-elect, should be 3 or 5 for production
bootstrap_expect = 1

}


4. Define client.hcl and run `sudo nomad agent -config client.hcl`

Increase log verbosity

log_level = "DEBUG"

datacenter = "DC1"

Setup data dir

data_dir = "/opt/nomad/data"

Give the agent a unique name. Defaults to hostname

name = "nomad-agent"

Enable the client

client { enabled = true

# For demo assume we are talking to server1. For production,
# this should be like "nomad.service.consul:4647" and a system
# like Consul used for service discovery.
servers = ["nomad-server:4647"]

}

Modify our port to avoid a collision with server1

ports { http = 5656 }

plugin "nomad-driver-lxc" { config { enabled = true lxc_path = "/var/lib/lxc" } }


5. Create job

job "example-lxc" { datacenters = ["DC1"] type = "service"

group "lxc" { task "busybox" { driver = "lxc"

  config {
    log_level = "trace"
    verbosity = "verbose"
    template  = "/usr/share/lxc/templates/lxc-busybox"
  }

  resources {
    cpu    = 500
    memory = 256
  }
}

} }


6. Run job
nomad job run -address nomad-server:4646

And it will fail

ID = 3e12b0e1 Eval ID = a0f6d22d Name = example-lxc.lxc4[0] Node ID = 28751cda Node Name = nomad-agent Job ID = example-lxc Job Version = 824638796448 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = Created = 12s ago Modified = 8s ago Reschedule Eligibility = 18s from now

Task "busybox" is "dead" Task Resources CPU Memory Disk Addresses 500 MHz 256 MiB 300 MiB

Task Events: Started At = N/A Finished At = 2019-10-14T15:30:59Z Total Restarts = 0 Last Restart = N/A

Recent Events: Time Type Description 2019-10-14T18:30:59+03:00 Killing Sent interrupt. Waiting 5s before force killing 2019-10-14T18:30:59+03:00 Not Restarting Error was unrecoverable 2019-10-14T18:30:59+03:00 Driver Failure rpc error: code = Unknown desc = error setting network type configuration: setting config item for the container failed 2019-10-14T18:30:58+03:00 Task Setup Building Task Directory 2019-10-14T18:30:58+03:00 Received Task received by client

yum-install-brains commented 4 years ago

We did some additional tests and downgraded lxc from 1:3.1.0+really3.0.4-2 to 1:2.1.0-0. With 1:2.1.0-0 all works fine.

Do you have any plans to support LXCv3?

yum-install-brains commented 4 years ago

I changed this function from nomad-driver-lxc to always return "lxc.net.0.type" and somehow it works with lxc 3.0.4 now. Do not know why (mb lxc.VersionAtLeast is not working properly with 3rd version of LXC).

notnoop commented 4 years ago

Hi @yum-install-brains ! Thanks for reaching out. It's definitely a bug that current driver doesn't support LXC 3. In my brief testing, I verified that the published binary fails with the error you reported, but I succeed when I recompile the current nomad-driver-lxc without any modification and use it with nomad. I wonder if there is an issue with the way we link against lxc library.

Can you try using a recompiled version but without your modification to networkTypeConfigKey? We'll investigate on our end too. Thanks!

notnoop commented 4 years ago

@yum-install-brains Thank you so much for reporting this again, it's a tricky one! - I reported the underlying issue to the lxc library we use: https://github.com/lxc/go-lxc/issues/135 and will follow up there.

notnoop commented 4 years ago

At this point, I'd recommend folks to recompile their nomad-driver-lxc with the lxc version they are running against. We'd appreciate pull requests to workaround this issue - or we may address it when the upstream issue is addressed.

jinnatar commented 3 years ago

Would it be possible to provide pre-built binaries at least for the most common LXC versions out there? For example 3.0.3 that ships with Ubuntu 18.04.

I did try to compile the plugin myself but building go is quite a painful experience for someone who isn't a go developer. Your official instructions of make build is not nearly enough when all the dependencies are not pulled in and the usual go get method crashes on sizeofPtr redeclaration errors. I assume I don't have a recent enough version of go, but that then goes into a whole rabbithole of dealing with a non-system Go installation when I'm just attempting to build one wayward plugin. :-)

Edit: Was able to use a PPA to get golang-go 1.15.6: https://github.com/golang/go/wiki/Ubuntu .. then the build works ok. So might want to add a minimum go version notice in the README. I know go abandons versions on the regular but it's a bit annoying for LTS users.

h0tw1r3 commented 2 years ago

Likely related to this bug in go-lxc https://github.com/lxc/go-lxc/issues/135