hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

Use `memory.peak` as fallback when `memory.max_usage_in_bytes` is not reported (Linux, cgroups-v2) #19855

Open rostow opened 9 months ago

rostow commented 9 months ago

The problem

For a given task (docker driver) run as part of a job allocation on a Linux client running with cgroups-v2 enabled (e.g. default in Ubuntu for versions >= 21), the memory MaxUsage stat is always reported as 0. Example:

"mytask": {
  "ResourceUsage": {
    "MemoryStats": {
        "RSS": 0,
        "Cache": 0,
        "Swap": 0,
        "MappedFile": 0,
        "Usage": 21369004032,
        "MaxUsage": 0,
        "KernelUsage": 0,
        "KernelMaxUsage": 0,
        "Measured": ["Cache", "Swap", "Usage"]
    }
  }
}

The Measured field indicates that only Cache, Swap and Usage are measured, but not MaxUsage. For the same type of job in Windows clients, however, the Measured field does contain MaxUsage, and indeed tasks running on Windows clients all have non-zero values for MaxUsage.

Looking at #12088, the discussion mentions that cgroups-v2 doesn't report certain values, such as memory.max_usage_in_bytes, which is currently used by Nomad to report the MaxUsage as far as I understand (correct me if I'm wrong here).

Proposal

The Linux maintainers "fixed" this in May 2022 by adding memory.peak to cgroups-v2 (see commit).

Would it be possible for Nomad to use memory.peak as a fallback value when memory.max_usage_in_bytes is not available?

Use-cases

Knowing the maximum memory used by tasks is a very useful piece of info to properly size workloads, debug issues, etc.

Attempted Solutions

A workaround is to disable cgroups-v2 and use cgroups-v1. However, it doesn't fully work. Although it returns non-zero values, the max memory used seems to be misreported. See #19854 for reference

lgfa29 commented 9 months ago

Thanks for the suggestion @rostow.

Unfortunately we would still need to wait for the Docker API to (maybe?) support this approach, which I'm not sure there are any plans for.

I will keep this open in case there are any updates to the Docker API, but until then there isn't much we can do about it.