Closed notnoop closed 4 years ago
While there is some overhead Nomad 0.9, it's not in the order of magnitude the question assumes. Basic analysis is misleading and extrapolation leads us astray here for two main reasons.
When running tens/hundreds of tasks, plain RSS is a bad value to sum to extrapolate on. Besides process specific memory (e.g. heap/stack), RSS also includes the loaded portions of the binary and shared libraries. The kernel caches and shares these shared libraries memory bits when running multiple instances of the same binary.
For example, the kernel may load and share glibc
library among the so many processes that are linked to it. Though loaded once, the library overhead gets reported in the RSS of each of these processes.
The nomad large binary, around ~84MB currently, contributes to large reported RSS here, and is cached effectively when running at scale. In some of our tests, ~26MB out of 30MB RSS was due to nomad binary and some shared binaries (e.g. libc, libpthreads, ld); though values differ by tests and extract executor.
Detailing golang memory management and Garbage Collection (GC) is beyond scope here, and there are wealth of resources [1][2][3]. The salient point is each auxiliary process manages its own heap: each may allocate more memory than absolutely needed immediately, and may be slow at releasing freed memory back to system; also the system may lazily reclaim unused memory[4].
When running so many tasks, nomad auxiliary processes may claim more memory than warranted and complicate basic analysis, though they get freed to system under memory pressure.
At this point, we don't believe that externalizing processes cause a substantial increase in memory usage. We recognize that there is room for improvements (e.g. tweaking rpc buffers, etc) to reduce memory usage overall.
I may follow up with more detailed reports of finding as well as follow up github issues from research.
[1] https://blog.golang.org/ismmkeynote [2] https://medium.com/samsara-engineering/running-go-on-low-memory-devices-536e1ca2fe8f [3] https://povilasv.me/go-memory-management/ [4] https://golang.org/doc/go1.13#runtime
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad 0.9 introduced new auxiliary processes (e.g. logmon, docker_logger) per task. Nomad 0.8 only had executor for raw_exec/exec/java driver tasks. On Linux, each these processes consume around ~30MB, though Nomad 0.9.6 reduced the RSS metric to 10-25MB by https://github.com/hashicorp/nomad/pull/6341 .
Is there a cause of concern here? How does memory usage scale with processes? Would 100 running raw_exec tasks cause 3-5GB of overhead?