Open drpauldixon opened 8 months ago
Hi @drpauldixon 👋
Would you be able to test with the Nomad client agent running as root? That's the supported deployment for Nomad.
Also, are you running Nomad in a virtualized environment, like cloud instances? If so, you may need to install the dmidecode
if it's not present (https://developer.hashicorp.com/nomad/docs/concepts/cpu#virtual-cpu-fingerprinting).
@drpauldixon are these tasks that were started with Nomad 1.0.x, and now you're inspecting them with Nomad 1.7.x (without restarting the tasks or the client host)? If so, that's probably not going to work.
Apologies, had been manic this last week. @shoenig No, for the upgrade, I stopped everything, removed nomad, cleared out the old files (data directory), then installed the new version + configs then started up the jobs.
Hi @drpauldixon 👋
Would you be able to test with the Nomad client agent running as root? That's the supported deployment for Nomad.
Also, are you running Nomad in a virtualized environment, like cloud instances? If so, you may need to install the
dmidecode
if it's not present (https://developer.hashicorp.com/nomad/docs/concepts/cpu#virtual-cpu-fingerprinting).
It works perfectly fine with the client running as root. However, since my use case requires the raw exec driver/plugin, that results in my jobs running as root. Which is a no-go. Unfortunately using the more secure exec driver adds many complications and excessive diskspace usage, slow service start etc, that it's not a viable option.
Although as noted above Nomad clients should be run as root, it seems like it should be possible for us to get metrics from the allocations even in the face of that. I'm going to accept this issue and mark it for roadmapping but I'm also going to be honest and note it's not likely to get immediate attention.
Although as noted above Nomad clients should be run as root, it seems like it should be possible for us to get metrics from the allocations even in the face of that. I'm going to accept this issue and mark it for roadmapping but I'm also going to be honest and note it's not likely to get immediate attention.
I did debug and locate the root cause I believe, please see the comments and that code snippet in #20285 . if you could fix this, that would be very much appreciated. FYI, I did tried to submit an pull request for this, but it doesn't allow me to do it though.
Thanks @simon1990zcs. I've just closed that issue as a duplicate. The approach you've got there seems like it could have potential but the notion of "no cgroup" sounds weird to me. Isn't there always a cgroup in play, even if just the system slice? We'd be happy to discuss further in a PR review in any case. On GitHub if you're not a contributor for the repo you need to fork the repo and make a PR from the fork.
Thanks @simon1990zcs. I've just closed that issue as a duplicate. The approach you've got there seems like it could have potential but the notion of "no cgroup" sounds weird to me. Isn't there always a cgroup in play, even if just the system slice? We'd be happy to discuss further in a PR review in any case. On GitHub if you're not a contributor for the repo you need to fork the repo and make a PR from the fork.
speaking of cgroup, you are right, in reality, it should always return a cgroup. But the current code ignore it or consider to be OFF when user is not root I believe.
func detect() Mode {
if os.Geteuid() > 0 {
return OFF
}
...
cgroup is not detected in version later than 1.7.0 under rootless agent, which is also briefly mentioned in the issue #20285
Nomad version
nomad --version Nomad v1.7.3 BuildDate 2024-01-15T16:55:40Z Revision 60ee328f97d19d2d2d9761251b895b06d82eb1a1
Operating system and Environment details
Issue
cpu and memory allocations are showing as zero.
Reproduction steps
Launch a raw_exec (non root) job on a CentOS 7 host and examine the allocation - e.g.
Expected Result
CPU and Memory allocations are non-zero.
Actual Result
CPU and Memory allocations are zero
Additional info
I have been experimenting with nomad and the raw_exec driver (client running as a non root user) and on version 1.0.4 (integrated with consul) I could see CPU and Memory stats for the allocations. e.g. go to topology, click on a Job and you see the aggregated memory and cpu graphs for that job.
I recently upgraded (OK, removed and re-installed on the same host) using Nomad 1.7.3 and instead of consul for service discovery, I'm using Nomad itself (i.e. not using consul at all).
Again, Nomad clients are running as non-root. Jobs are running (everything working as before), except the cpu and memory allocation is showing as zero. e.g.
And the CPU / Memory graphs for the job are also zero (inspecting the element on the graph and looking at the JSON response for the metrics - all show zeros too).
1x Server (running as root). 3x Clients running as non-root.