hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.86k stars 1.95k forks source link

rootless client with `raw_exec` cpu and memory allocation showing as zero #19828

Open drpauldixon opened 8 months ago

drpauldixon commented 8 months ago

Nomad version

nomad --version Nomad v1.7.3 BuildDate 2024-01-15T16:55:40Z Revision 60ee328f97d19d2d2d9761251b895b06d82eb1a1

Operating system and Environment details

cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

mount|grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)

Issue

cpu and memory allocations are showing as zero.

Reproduction steps

Launch a raw_exec (non root) job on a CentOS 7 host and examine the allocation - e.g.

nomad alloc status -stats 6738ce16
Task Resources:
CPU        Memory       Disk     Addresses
0/200 MHz  0 B/400 MiB  300 MiB

Memory Stats
RSS  Swap
0 B  0 B

CPU Stats
Percent  System Mode  User Mode
0.00%    0.00%        0.00%

Expected Result

CPU and Memory allocations are non-zero.

Actual Result

CPU and Memory allocations are zero

Additional info

I have been experimenting with nomad and the raw_exec driver (client running as a non root user) and on version 1.0.4 (integrated with consul) I could see CPU and Memory stats for the allocations. e.g. go to topology, click on a Job and you see the aggregated memory and cpu graphs for that job.

I recently upgraded (OK, removed and re-installed on the same host) using Nomad 1.7.3 and instead of consul for service discovery, I'm using Nomad itself (i.e. not using consul at all).

Again, Nomad clients are running as non-root. Jobs are running (everything working as before), except the cpu and memory allocation is showing as zero. e.g.

nomad alloc status -stats 6738ce16
Task Resources:
CPU        Memory       Disk     Addresses
0/200 MHz  0 B/400 MiB  300 MiB

Memory Stats
RSS  Swap
0 B  0 B

CPU Stats
Percent  System Mode  User Mode
0.00%    0.00%        0.00%

And the CPU / Memory graphs for the job are also zero (inspecting the element on the graph and looking at the JSON response for the metrics - all show zeros too).

cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

1x Server (running as root). 3x Clients running as non-root. nomad-1 0 4 nomad-1 7 3

lgfa29 commented 8 months ago

Hi @drpauldixon 👋

Would you be able to test with the Nomad client agent running as root? That's the supported deployment for Nomad.

Also, are you running Nomad in a virtualized environment, like cloud instances? If so, you may need to install the dmidecode if it's not present (https://developer.hashicorp.com/nomad/docs/concepts/cpu#virtual-cpu-fingerprinting).

shoenig commented 8 months ago

@drpauldixon are these tasks that were started with Nomad 1.0.x, and now you're inspecting them with Nomad 1.7.x (without restarting the tasks or the client host)? If so, that's probably not going to work.

drpauldixon commented 8 months ago

Apologies, had been manic this last week. @shoenig No, for the upgrade, I stopped everything, removed nomad, cleared out the old files (data directory), then installed the new version + configs then started up the jobs.

drpauldixon commented 8 months ago

Hi @drpauldixon 👋

Would you be able to test with the Nomad client agent running as root? That's the supported deployment for Nomad.

Also, are you running Nomad in a virtualized environment, like cloud instances? If so, you may need to install the dmidecode if it's not present (https://developer.hashicorp.com/nomad/docs/concepts/cpu#virtual-cpu-fingerprinting).

It works perfectly fine with the client running as root. However, since my use case requires the raw exec driver/plugin, that results in my jobs running as root. Which is a no-go. Unfortunately using the more secure exec driver adds many complications and excessive diskspace usage, slow service start etc, that it's not a viable option.

tgross commented 3 months ago

Although as noted above Nomad clients should be run as root, it seems like it should be possible for us to get metrics from the allocations even in the face of that. I'm going to accept this issue and mark it for roadmapping but I'm also going to be honest and note it's not likely to get immediate attention.

simon1990zcs commented 3 months ago

Although as noted above Nomad clients should be run as root, it seems like it should be possible for us to get metrics from the allocations even in the face of that. I'm going to accept this issue and mark it for roadmapping but I'm also going to be honest and note it's not likely to get immediate attention.

I did debug and locate the root cause I believe, please see the comments and that code snippet in #20285 . if you could fix this, that would be very much appreciated. FYI, I did tried to submit an pull request for this, but it doesn't allow me to do it though.

tgross commented 3 months ago

Thanks @simon1990zcs. I've just closed that issue as a duplicate. The approach you've got there seems like it could have potential but the notion of "no cgroup" sounds weird to me. Isn't there always a cgroup in play, even if just the system slice? We'd be happy to discuss further in a PR review in any case. On GitHub if you're not a contributor for the repo you need to fork the repo and make a PR from the fork.

simon1990zcs commented 3 months ago

Thanks @simon1990zcs. I've just closed that issue as a duplicate. The approach you've got there seems like it could have potential but the notion of "no cgroup" sounds weird to me. Isn't there always a cgroup in play, even if just the system slice? We'd be happy to discuss further in a PR review in any case. On GitHub if you're not a contributor for the repo you need to fork the repo and make a PR from the fork.

speaking of cgroup, you are right, in reality, it should always return a cgroup. But the current code ignore it or consider to be OFF when user is not root I believe.

func detect() Mode {
    if os.Geteuid() > 0 {
        return OFF
    }
...

cgroup is not detected in version later than 1.7.0 under rootless agent, which is also briefly mentioned in the issue #20285