Closed 9maf4you closed 11 months ago
Hi @9maf4you 👋
We don't officially support a rootless Docker daemon. Docker has some quirks around network management that we need to workaround in order to provide things like bridge
network mode and other things, and I'm not even sure where to start 🤔
Setting the network mode to slirp4netns
will not work because, as far as I can tell, this not actually a network mode, but more like network driver.
For the CNI driver error, I don't remember seeing that unknown FS magic
message before, and looking online there seems to be a multitude of reasons why it could happen.
Is the Nomad agent running as root?
I will adjust the issue title to a feature request, as I don't expect rootless Docker to work.
@9maf4you if it's an option for you, the podman task driver supports rootless mode.
The way Docker and Podman handle things (especially networking) is very different, in a way that working with Podman is much more flexible / easier for us.
Hey! @shoenig @lgfa29 Thank you for the quick response. Podman is an option for me, so I'll take a shot with it.
@lgfa29
Is the Nomad agent running as root? yes it is.
Hello, @shoenig I've tried podman as you suggested. Are you sure the bundle rootless podman + consul-connect works? it doesn't work for me.
My set-up is made by this docs
grep ^runtime /usr/share/containers/containers.conf
runtime = "crun
rpm -q slirp4netns
slirp4netns-1.2.0-2.module+el8.8.0+1265+fa25dd7a.x86_64
rpm -q fuse-overlayfs
fuse-overlayfs-1.11-1.module+el8.8.0+1265+fa25dd7a.x86_64
mount | grep cgroup
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)
ps uax | grep podman
rocky 1766 0.0 0.0 89920 10316 ? S 12:27 0:00 /usr/bin/podman
Nomad client works under the root and podman works from unprivileged user and it fails during allocation with that error:
Time Type Description
2023-08-28T13:14:26Z Alloc Unhealthy Unhealthy because of failed task
2023-08-28T13:14:22Z Killing Sent interrupt. Waiting 5s before force killing
2023-08-28T13:14:22Z Not Restarting Error was unrecoverable
2023-08-28T13:14:22Z Driver Failure rpc error: code = Unknown desc = failed to start task, could not start container: cannot start container, status code: 500: {"cause":"OCI permission denied","message":"crun: cannot setns `/var/run/netns/1abe06e9-6953-59eb-1387-e4b95943a19e`: Operation not permitted: OCI permission denied","response":500}
job "c9" {
group "api" {
network {
mode = "bridge"
}
service {
name = "count-api"
port = "9001"
connect {
sidecar_service {}
}
}
task "web" {
driver = "podman"
config {
image = "hashicorpdev/counter-api:v3"
auth_soft_fail = true
}
}
}
group "dashboard" {
network {
mode = "bridge"
port "http" {
static = 9002
to = 9002
}
}
service {
name = "count-dashboard"
port = "9002"
connect {
sidecar_service {
proxy {
upstreams {
destination_name = "count-api"
local_bind_port = 8080
}
}
}
}
}
task "dashboard" {
driver = "podman"
env {
COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
}
config {
image = "hashicorpdev/counter-dashboard:v3"
auth_soft_fail = true
}
}
}
}
And the reason for this I believe is https://github.com/hashicorp/nomad/issues/13669
Ahh sorry @9maf4you, for some reason I thought we had tested the rootless connect scenario, but no. Thinking about it now it makes sense to see the error you are getting. Podman is trying to join the network namespace created by the Nomad client for each task in the group to join, but in doing so requires the CAP_SYS_ADMIN
privilege. https://www.man7.org/linux/man-pages/man2/setns.2.html (or running as root).
Hey @shoenig . Should I open a new issue or is it already exist?
Hey @shoenig sorry for bothering you once again. Could you please reply to my previous question. And if you are going to fix the issue with caps could you please share plans about it. Thanks!
Hi @9maf4you, I'm not sure what the fix would be; joining a network namespace in Linux is an operation that requires root and there isn't much Nomad can do about that. Ostensibly the solution is to have a parent process launched as root join the namespace and then fork/exec into the desired task - in fact this is was Nomad does for the exec/raw_exec task drivers. However Nomad is not the parent of a docker container - docker is, so you'd have to talk to docker about implementing that feature.
Hey @shoenig. Sorry, probably, my message wasn't clear enough. In my last message I was talking about podman since you suggest to try it. The issue Your reply Thanks again!
I'm going to close this issue, because as noted above running as non-root is unsupported and building support isn't on the near-term roadmap. I'm going to link to this issue from https://github.com/hashicorp/nomad/issues/13669 for discussions on how we might work on this in the future. If you have more comments on this after having read through that issue in detail, I'd suggest you make comments over in #13669.
Nomad version
Nomad v1.6.1 BuildDate 2023-07-21T13:49:42Z Revision 515895c7690cdc72278018dc5dc58aca41204ccc
Operating system and Environment details
Rocky Linux release 8.7 (Green Obsidian)
mynet.conflist what taken from the doc just the name was changed.
Issue
I'm trying to launch our rootless containers ( docker ) with consul-connect According the docs the bridge configuration is a prerequisite for Consul Connect. It is raises a question of how to configure nomad/cni appropriate way. rootless docker uses slirp4netns to set up network which means slirp4netns CNI plugin should be exist for it.
Anyway I've tried some crazy configurations just in hopes something wasn't documented. So it seems to me as not supported. But probably I missed something.
Reproduction steps: network.mode = cni/mynet
set
network.mode = "cni/mynet"
Expected Result
nomad spin-up a container
Actual Result
none containers are running
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
-
Reproduction steps: network_mode = "slirp4netns" for docker's driver
set
network_mode = "slirp4netns"
Expected Result
nomad spin-up a container
Actual Result
none containers are running
Job file (if appropriate)
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
-
Reproduction steps:
set
network.mode = slirp4netns
Expected Result
nomad spin-up a container
Actual Result
none containers are running
Job file (if appropriate)
Nomad Server logs (if appropriate)
-
Nomad Client logs (if appropriate)