Open rostow opened 9 months ago
Thanks for the suggestion @rostow.
Unfortunately we would still need to wait for the Docker API to (maybe?) support this approach, which I'm not sure there are any plans for.
I will keep this open in case there are any updates to the Docker API, but until then there isn't much we can do about it.
The problem
For a given task (docker driver) run as part of a job allocation on a Linux client running with cgroups-v2 enabled (e.g. default in Ubuntu for versions >= 21), the memory
MaxUsage
stat is always reported as 0. Example:The
Measured
field indicates that onlyCache
,Swap
andUsage
are measured, but notMaxUsage
. For the same type of job in Windows clients, however, theMeasured
field does containMaxUsage
, and indeed tasks running on Windows clients all have non-zero values forMaxUsage
.Looking at #12088, the discussion mentions that
cgroups-v2
doesn't report certain values, such asmemory.max_usage_in_bytes
, which is currently used by Nomad to report theMaxUsage
as far as I understand (correct me if I'm wrong here).Proposal
The Linux maintainers "fixed" this in May 2022 by adding
memory.peak
tocgroups-v2
(see commit).Would it be possible for Nomad to use
memory.peak
as a fallback value whenmemory.max_usage_in_bytes
is not available?Use-cases
Knowing the maximum memory used by tasks is a very useful piece of info to properly size workloads, debug issues, etc.
Attempted Solutions
A workaround is to disable
cgroups-v2
and usecgroups-v1
. However, it doesn't fully work. Although it returns non-zero values, the max memory used seems to be misreported. See #19854 for reference