hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.87k stars 1.95k forks source link

[Question] Does Nomad 0.9 add significant memory overhead? #6543

Closed notnoop closed 4 years ago

notnoop commented 4 years ago

Nomad 0.9 introduced new auxiliary processes (e.g. logmon, docker_logger) per task. Nomad 0.8 only had executor for raw_exec/exec/java driver tasks. On Linux, each these processes consume around ~30MB, though Nomad 0.9.6 reduced the RSS metric to 10-25MB by https://github.com/hashicorp/nomad/pull/6341 .

Is there a cause of concern here? How does memory usage scale with processes? Would 100 running raw_exec tasks cause 3-5GB of overhead?

notnoop commented 4 years ago

While there is some overhead Nomad 0.9, it's not in the order of magnitude the question assumes. Basic analysis is misleading and extrapolation leads us astray here for two main reasons.

Nomad binary and RSS counting

When running tens/hundreds of tasks, plain RSS is a bad value to sum to extrapolate on. Besides process specific memory (e.g. heap/stack), RSS also includes the loaded portions of the binary and shared libraries. The kernel caches and shares these shared libraries memory bits when running multiple instances of the same binary.

For example, the kernel may load and share glibc library among the so many processes that are linked to it. Though loaded once, the library overhead gets reported in the RSS of each of these processes.

The nomad large binary, around ~84MB currently, contributes to large reported RSS here, and is cached effectively when running at scale. In some of our tests, ~26MB out of 30MB RSS was due to nomad binary and some shared binaries (e.g. libc, libpthreads, ld); though values differ by tests and extract executor.

Golang memory usage and Garbage Collection Behavior

Detailing golang memory management and Garbage Collection (GC) is beyond scope here, and there are wealth of resources [1][2][3]. The salient point is each auxiliary process manages its own heap: each may allocate more memory than absolutely needed immediately, and may be slow at releasing freed memory back to system; also the system may lazily reclaim unused memory[4].

When running so many tasks, nomad auxiliary processes may claim more memory than warranted and complicate basic analysis, though they get freed to system under memory pressure.

Take away

At this point, we don't believe that externalizing processes cause a substantial increase in memory usage. We recognize that there is room for improvements (e.g. tweaking rpc buffers, etc) to reduce memory usage overall.

I may follow up with more detailed reports of finding as well as follow up github issues from research.

[1] https://blog.golang.org/ismmkeynote [2] https://medium.com/samsara-engineering/running-go-on-low-memory-devices-536e1ca2fe8f [3] https://povilasv.me/go-memory-management/ [4] https://golang.org/doc/go1.13#runtime

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.