Open kwianeck opened 3 years ago
From what i can see, nomad-lxc-driver does not release handles to the most recently created container in /sys/fs/cgroup/
├─lxc.monitor.b1-b74fc4d5-5dc6-715f-f4ab-9024b01ba7a6
│ ├─25364 /opt/nomad/data/plugins/nomad-driver-lxc
│ └─28310 [lxc monitor] /var/lib/lxc b1-b74fc4d5-5dc6-715f-f4ab-9024b01ba7a6
└─lxc.monitor.b2-c0881597-2453-9836-a739-c362bb2dd990
└─27746 [lxc monitor] /var/lib/lxc b2-c0881597-2453-9836-a739-c362bb2dd990
b1 has been created after b2. as you can see, b2 is not occupied by driver's process. to properly cleanup the container (remove all its artifacts from nomad client) i need to either restart nomad which restarts also driver's process, or create another container to release driver's handles to container which will be removed later. i am not programmer so cannot see how it can be fixed in the driver's code
i used nomad 1.1.3 and 1.1.4. i recompiled the driver for both versions with 2 different pkg.in/go-lxc.v2 drivers (version from 2018 and 2021). no differences i guess the issue is in the driver's code and not in any dependecied library
hey,
is there anyone who has similar issue and found solution?
@kwianeck are you using Centos/Redhat/VzLinux? I had encountered reoccurring issues with lxc/lxd and cgroups on Centos (using lxd snap package) but they have went away on Ubuntu 20.04.
We are using LXD/LXC on Ubuntu 20.04, version 4.0.7 and during my testing I encountered issues that I couldn't resolve. The error complains about network type configuration, I am guessing that it should use the default lxc profile by default? I've included my default profile and other details. I hope it helps resolve issues with this plugin as I'd love to start using nomad for all my lxd/lxc containers!
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
$ lxc --version
4.0.7
$ lxc profile show default
config: {}
description: Default LXD profile
devices:
root:
path: /
pool: default
type: disk
name: default
used_by: []
$ cat test.nomad
job "example-lxc" {
datacenters = ["dc1"]
type = "service"
group "example" {
task "example" {
driver = "lxc"
config {
log_level = "info"
verbosity = "verbose"
template = "/usr/share/lxc/templates/lxc-busybox"
}
resources {
cpu = 500
memory = 256
}
}
}
}
# nomad agent -dev -bind 0.0.0.0 -log-level INFO -plugin-dir /opt/nomad/data/plugins
2021-10-14T17:47:59.345Z [INFO] client.driver_mgr.nomad-driver-lxc: starting lxc task: driver=lxc @module=lxc driver_cfg="{Template:/usr/share/lxc/templates/lxc-busybox Distro: Release: Arch: ImageVariant: ImageServer: GPGKeyID: GPGKeyServer: DisableGPGValidation:false FlushCache:false ForceCache:false TemplateArgs:[] LogLevel:info Verbosity:verbose Volumes:[]}" timestamp=2021-10-14T17:47:59.345Z
2021-10-14T17:47:59.691Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=073fb260-00f9-3ae7-704b-c9a3f7a777d9 task=example error="rpc error: code = Unknown desc = error setting network type configuration: setting config item for the container failed"
Thank you.
It looks like go-lxc only supports cgoups (not cgroups2). I found a few other incompatibilities and bugs while testing lxc 4 support. Will create a merge request and tag this issue "soon".
Acording to https://discuss.linuxcontainers.org/t/lxc-4-0-lts-has-been-released/7182, cgroup specification for container and it's monitor have been separated.
My observation is that driver is not requesting to lxc to fully cleanup container. Instead, when stopping task, lxc removes cgroup for container but leaves cgroups for lxc.monitor under /sys/fs/cgroup//
in this way, within the time, we can expect hundreds of left objects for many lxc.monitors of containers which have been removed long time back
ex.
Output after alpine2 removal (nomad job stop alpine2)
as you can see, lxc.payload directory has gone (container's specific cgroups), however, lxc-monitor cgroups stay