Closed acziryak closed 1 year ago
FWIW, I do see the directory being created on the worker node:
root@ind-test-nomad-worker13:/opt/nomad/data/alloc/da0eb8db-4ea4-bcb8-9b8e-92c161ac4652/ceph-csi-controller-ceph_csi-us-ind-test/local# ls -al
total 16
drwxrwxrwx 3 nobody nogroup 4096 Jan 25 14:20 .
drwxrwxrwx 5 nobody nogroup 4096 Jan 25 14:19 ..
-rw-r--r-- 1 root root 155 Jan 25 14:19 config.json
drwxr-xr-x 2 100000 100000 4096 Jan 25 14:20 csi
Hi @acziryak 👋
Just to double-check, are you running the Nomad agent as root
?
AFAICT, yes
# ps -ef | grep nomad
root 204168 1 0 13:04 ? 00:02:21 /usr/bin/nomad agent -config /etc/nomad.d
FWIW, I also see that the /sys/fs/cgroup/pids.max
file is owned by nobody:nobody
as well when viewed from inside the container.
So I was able to get it work with an additional parameter:
config {
userns_mode = "host"
}
For some reason, this fixed the permissions. However, I don't see how it would be possible without that. I'm not sure if the documentation here should reflect that, or if there alternative agent configurations where ACLs or permissions, or default user namespaces would be set up differently than mine, which would alleviate the need of this parameter.
For future reference this param is documented here: https://developer.hashicorp.com/nomad/docs/drivers/docker#userns_mode
In there, it does say:
Set to host to use the host's user namespace (effectively disabling user namespacing) when user namespace remapping is enabled on the docker daemon. This field has no effect if the docker daemon does not have user namespace remapping enabled.
Which I was able to verify with:
# grep 'userns' /etc/docker/daemon.json
"userns-remap": "default"
I would assume that there would be no harm in putting in this param in the example, because it purportedly will not do anything if userns-remap
is not enabled, and is apparently required if userns-remap
is indeed enabled. But that's just my suggestion going forward.
@acziryak the docs for csi_plugin
note the following:
Note: Plugins running as node or monolith require root privileges (or CAP_SYS_ADMIN on Linux) to mount volumes on the host. With the Docker task driver, you can use the privileged = true configuration, but no other default task drivers currently have this option.
Mounting volumes is a privileged operation in Linux and can't be done "rootlessly". The allow_caps
and allow_privileged
settings you have in the plugin config are fine, but you don't have privileged = true
(or an equivalent combination of caps) set anywhere in the task configuration block:
"Config": {
"image": "quay.io/cephcsi/cephcsi:v3.7.2",
"volumes": [
"./local/config.json:/etc/ceph-csi-config/config.json"
],
"mounts": [
{
"type": "tmpfs",
"target": "/tmp/csi/keys",
"readonly": false,
"tmpfs_options": {
"size": 1000000
}
}
],
"args": [
"--type=rbd",
"--controllerserver=true",
"--drivername=rbd.csi.ceph.com",
"--endpoint=unix://csi/csi.sock",
"--nodeid=${node.unique.name}",
"--instanceid=${node.unique.name}-controller",
"--logtostderr=true",
"--v=5",
"--metricsport=${NOMAD_PORT_metrics}"
]
},
Nomad version
Nomad v1.4.3 (f464aca721d222ae9c1f3df643b3c3aaa20e2da7)
Operating system and Environment details
Issue
Ceph CSI Controller does not start following instructions here: https://docs.ceph.com/en/latest/rbd/rbd-nomad/
Reproduction steps
Worker node config:
Expected Result
Ceph CSI plugin starts up correctly without permission denied on both
/sys/fs/cgroup//pids.max
and/csi
.Actual Result
While investigating the below, I found out that the
/csi
mount is mounted asnobody:nobody
, which renders it inaccessible from the server.This results in a failed job. I'm not sure if
/csi
should be mounted asnobody:nobody
, or if it should be populated somewhere in the filesystem, but the onlycsi
directories I can find under that allocation are owned by user 10000, and have no sockets inside them.Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)