Open Tetha opened 4 years ago
Thanks for opening this @Tetha! Glad to hear you have a workaround but there's definitely something we're missing if the host volume doesn't work but a Docker volume mount does. Just for my clarity: in the jobspec you've provided, the commented-out section is what's not working, which you've worked around with the config.volumes
section, right?
Exactly.
My first attempt at writing the job was following the documentation on host volumes and looked like this
job "internal-service" {
type = "service"
datacenters = [ "dc1" ]
group "service" {
count = "1"
volume "service" {
type = "host"
source = "internal-service"
}
task "service" {
driver = "docker"
leader = true
volume_mount {
volume = "service"
destination = "/service-data"
}
config {
image = "..."
}
// ...
}
My current workaround is the job spec from the initial post.
volume
stanza in order to force nomad to schedule the job onto the client with the host_volume
. Given there is just one of those, it pins the job to a clientdocker.config.volume
array to mount the directory of the host volume into the container including the selinuxlabel
from the docker plugin config. To be entirely precise, I also had to enable the arbitrary volume mount on the docker driver as well:plugin "docker" {
volumes {
enabled = true
selinuxlabel = "z"
}
}
}
Hi @Tetha I got a chance to dig into this a bit and it looks like we're running into a Docker limitation, but one that appears to be intentional.
Any volume we mount with the volume_mount
flag (host volumes or CSI volumes) get passed as part of the Docker driver MountConfig
. This is the same as if you were using the mounts
block in the Docker driver, as opposed to the volumes
block like you're doing above.
The Docker container resulting from a job that has a volume_mount
, a volumes
block, and a mounts
block looks like the following:
$ docker inspect a224
[
{
...
"HostConfig": {
"Binds": [
"/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/alloc:/alloc",
"/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/local:/local",
"/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/secrets:/secrets",
"/srv/volumeSource0:/local/srv"
],
...
"Mounts": [
{
"Type": "bind",
"Source": "/srv/volumeSource1",
"Target": "/local/vagrant",
"ReadOnly": true,
"BindOptions": {}
},
{
"Type": "bind",
"Source": "/srv/volumeSource2",
"Target": "/test",
"ReadOnly": true,
"BindOptions": {
"Propagation": "rprivate"
}
}
],
...
},
...
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/local",
"Destination": "/local",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/redis/secrets",
"Destination": "/secrets",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/srv/volumeSource0",
"Destination": "/local/srv",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/srv/volumeSource1",
"Destination": "/local/vagrant",
"Mode": "",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/srv/volumeSource2",
"Destination": "/test",
"Mode": "",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/nomad/data/allocs/d730cdde-d062-ddc3-d33e-a0240e4e8ebc/alloc",
"Destination": "/alloc",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
....
So the mounts
block maps to the Docker command line's --mount
flag, about which the Docker docs say:
The
--mount
flag does not supportz
orZ
options for modifying selinux labels.
It looks like their reasoning for this can be found in places like: https://github.com/moby/moby/issues/36282 https://github.com/moby/moby/issues/30934 https://github.com/docker/cli/pull/832/files
For Nomad, we define the relabelling in the client configuration, which is privileged so the destructive possibilities here are lessened (although it still could be a nasty footgun for someone). I'm still trying to figure out what the right way to handle this problem is and what we can do with it in the Nomad driver. So I just wanted to check in and let you know it's been at least looked at, but it's probably not going to get fixed in Nomad 1.0.0.
I believe I'm running into this issue when trying to use the ceph csi. My nomad clients are on Fedora server systems with selinux enforcing enabled.
Despite enabling the container_use_cephfs
selinux boolean, my containers cannot access the mounted volume:
# running a `ls /srv` in the container results in this selinux denial
type=AVC msg=audit(1638965583.753:493): avc: denied { read } for pid=2761 comm="ls" name="/" dev="rbd0" ino=2 scontext=system_u:system_r:container_t:s0:c583,c1011 tcontext=system_u:object_r:unlabeled_t:s0 tclass=dir permissive=0
docker inspect
reveals
...
"HostConfig": {
"Binds": [
"/var/lib/nomad/alloc/32159918-57d9-1663-544b-fa5d415712c7/alloc:/alloc",
"/var/lib/nomad/alloc/32159918-57d9-1663-544b-fa5d415712c7/mysql-server/local:/local",
"/var/lib/nomad/alloc/32159918-57d9-1663-544b-fa5d415712c7/mysql-server/secrets:/secrets"
],
....
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/nomad/client/csi/node/ceph-csi/per-alloc/32159918-57d9-1663-544b-fa5d415712c7/ceph-mysql-test-tf2/rw-file-system-single-node-writer",
"Target": "/srv",
"BindOptions": {
"Propagation": "rprivate"
}
}
],
...
Aside from disabling selinux, Is there a workaround? Or is use of nomad volumes with selinux just not supported yet?
Edit: one "workaround" I've identified is to disable selinux on a per-container basis by passing security_opt = ["label=disable"]
in the docker config of the job. This is better than disabling selinux entirely, but it is still not viable workaround for as it has to be applied to every stateful workload.
Edit: one "workaround" I've identified is to disable selinux on a per-container basis by passing
security_opt = ["label=disable"]
in the docker config of the job. This is better than disabling selinux entirely, but it is still not viable workaround for as it has to be applied to every stateful workload.
Yeah, that's currently the only reasonable workaround. We need to review how to handle the relabelling question safely and we haven't had a chance to do so yet.
For reporting security vulnerabilities please refer to the website.
If you have a question, prepend your issue with
[question]
or preferably use the nomad mailing list.If filing a bug please include the following:
Nomad version
Operating system and Environment details
These are Centos7.8 hosts with SELinux enabled.
Issue
We haven't given up running as many hosts as possible with SELinux enabled, including the nomad clients. As such, one change we had to implement was to configure the default selinux label for the docker driver on the clients:
As far as I understand labels here, this labels all docker volumes as "shared across all containers".
Now I came across another requirement, a service needed some local storage and as a quick workaround, I figured I'd use a host volume to give the task group some local storage and worry about outages later.
Nomad-Client Configuration:
I also made sure to chown the directory to the uid inside the container, as well as changing the selinux label of
/var/lib/service
tosystem_u:object_r:container_file_t
as other container files.However, no matter what I do, SELinux denies the container access to the file.
As a workaround, I enabled arbitrary volume mounts on the client and mounted the volume using the docker volume configuration (with the selinux label in place) and it works properly.
Job file (if appropriate)
Nomad Client logs (if appropriate)
Sadly, I can't really find the logs anymore since this situation was two week ago and the logs have been rotated. Let me know if I need to retry this with a test job to get the logs of docker / selinux / nomad.