hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.95k stars 1.96k forks source link

mac mini arm64 incorrectly reports ~98% memory usage, cannot run jobs #13096

Open dionjwa opened 2 years ago

dionjwa commented 2 years ago

Nomad version

Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)

Operating system and Environment details

macOS 12.4 Mac Mini 2020 16GB Ram

Docker:

Client:
 Cloud integration: v1.0.24
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:49:20 2022
 OS/Arch:           darwin/arm64
 Context:           default
 Experimental:      true

Server: Docker Desktop 4.8.1 (78998)
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:45:44 2022
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Issue

There is almost no user processes running consuming memory: image

But notice the qemu-system-aarch64 process, according to google, that's Docker for Mac.

Docker desktop:

8GB memory reserved for container: image

But nomad shows almost complete memory consumption

image

There are zero docker containers running.

Reproduction steps

Run nomad and consul as described in the docs here:

https://www.nomadproject.io/docs/faq#q-how-to-connect-to-my-host-network-when-using-docker-desktop-windows-and-macos=

The equivalent setup but on a mac with 64gb (and amd64 instead of arm64) shows the same "unavailable" memory but the excess capacity allows running docker jobs via nomad.

Expected Result

nomad sees the available memory reserved for docker: many docker job containers can be assigned

Actual Result

The memory reserved for docker is seen as consumed. The smallest jobs can be run, but no full stack multi-container applications, as that exceeds the (incorrect) memory availability

Job file (if appropriate)

The most basic job shown in tutorials

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

lhaig commented 2 years ago

@dionjwa can you please check which version of Nomad you are running please. the downloads page for 1.2.6 does not currently have arm binaries. https://releases.hashicorp.com/nomad/1.2.6 1.2.7 however does. https://releases.hashicorp.com/nomad/1.2.7

dionjwa commented 2 years ago

@lhaig I installed nomad via brew as per instructions here: Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)

I was running an old version, but installed via brew.

 nomad version
Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)

I upgraded, it's a single node cluster, both client and server:

nomad --version
Nomad v1.3.1 (2b054e38e91af964d1235faa98c286ca3f527e56)

But the problem persists, no allocations, and no other containers running, and as many host programs closed as I can, still memory is limited:

image
dionjwa commented 2 years ago

Another data point: maybe this is one aspect of the UI or memory measurement: the correct memory consumption/total is correctly shown in the "Topology" panel:

image

It just doesn't match when I go to the individual client (mac mini)

image
dionjwa commented 2 years ago

Also the mini CAN run jobs, but I don't know if it's because there's a bit of (incorrect displayed) memory capacity, or it is using the former above image. I don't know if I'll quickly exceed the wrong memory count.

lhaig commented 2 years ago

@dionjwa Thank you for this information. I was able to confirm this behaviour on my laptop with 32GB RAM I am seeing the same behaviour with my laptop using both the brew version and the downloaded darwin arm64 binary.

tgross commented 2 years ago

Hi @dionjwa! I was looking thru various macOS issues and I wanted to circle back to this. I don't see anything unexpected here based on the information in the initial post. I think there's maybe a little misunderstanding about what Nomad is fingerprinting and how Docker Desktop interacts with that.

As you can see, Docker Desktop is running a QEMU virtual machine. You've configured it with 8GB of RAM and 2GB of swap (which QEMU is going to put onto tmpfs, so in-memory). That's 10GB out of your 16GB reserved for the VM. Nomad is not running inside the VM, and can't see inside it. So when it fingerprints the host's OS, it sees that 10GB of the RAM is taken up by some process (and another 208MB for mds_stores, 185MB for spotlight, etc).

The Docker Desktop VM is running dockerd. The dockerd daemon only runs on Linux, so it's running inside the VM. Meanwhile, Nomad is running outside the Docker Desktop VM.

So how does Nomad talk to Docker at all? The VM exposes dockerd's unix domain socket (via virtio) to the macOS host. So when Nomad fingerprints dockerd by talking to its socket, it has no way of knowing that dockerd is actually going to launch container processes inside a VM somewhere.

Unfortunately there's no great way around this problem today outside of running Nomad in a container in the VM. We don't really support this, because it's frankly a really hairy thing to do correctly and you'll need to bind-mount resources from the VM that may not actually exist, depending on how the VM is built (I don't think Docker, Inc. documents much about the VMs internals). This definitely could be an interesting documentation project, but there's not much we can do here from a technical standpoint.