hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

Memory Usage of Allocation is always 0 bytes #9120

Closed sbrl closed 3 years ago

sbrl commented 4 years ago

Nomad version

Nomad v0.12.5 (514b0d667b57068badb43795103fb7dd3a9fbea7)

Operating system and Environment details

$ uname -a
Linux DEVICE_NAME 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 10 (buster)
Release:    10
Codename:   buster

4 x Raspberry Pi 4 w/4 GiB RAM (worker nodes), 1 x Raspberry Pi 4 w/2GiB RAM (server node, does not run tasks)

Issue

All allocated tasks (at least with the Docker driver) show a memory usage of 0 bytes. It never used to be this way, but then I checked one day and it showed 0 bytes for all my allocations and it hasn't fixed itself. Example screenshot from the web interface:

image

Reproduction steps

  1. Run any Nomad job
  2. Open the Nomad web interface
  3. Navigate to the allocation and view the resource usage graphs
  4. See error

Job file (if appropriate)

Any job file will do, but here's one of mine:

Click to expand

job "etherpad" {
    datacenters = ["dc1"]
    priority = 35

    task "etherpad" {
        driver = "docker"

        config {
            image = "registry.internal.example.com:5000/etherpad"
            labels { group = "services" }

            volumes = [
                # /srv/etherpad/var         Main settings directory
                "/mnt/shared/services/etherpad/var:/srv/etherpad/var",
                # /srv/etherpad/APIKEY.txt  Persistent API key
                "/mnt/shared/services/etherpad/APIKEY.txt:/srv/etherpad/APIKEY.txt",
                # /srv/etherpad/APIKEY.txt  Some other kind of key
                "/mnt/shared/services/etherpad/SESSIONKEY.txt:/srv/etherpad/SESSIONKEY.txt"
            ]

            port_map {
                main = 9001
            }
        }

        resources {
            memory = 200 # MiB
            network {
                port "main" {}
            }
        }

        service {
            name = "${TASK}"
            tags = [
                "service", "internal",
                "urlprefix-etherpad.example.com/",
                "auth=admin"
            ],
            address_mode = "host"
            port = "main"

            check {
                type = "http"
                port = "main"
                interval = "60s"
                timeout = "5s"
                path = "/"
            }
        }
    }
}

Nomad Client logs (if appropriate)

Logs available upon request, but the logging feature (at least for allocations) isn't working either

Nomad Server logs (if appropriate)

futuralogic commented 3 years ago

I am having the same problem - no memory stats for allocations - but only on one of my nomad instances of four.

I'd also add a point of clarification that the allocation doesn't show any RAM allocated, but the node itself does.

Node:

Screen Shot 2021-01-17 at 2 40 16 AM

Allocation:

image

Common variable between our situation is running on ARM (Raspberry Pi). I will try to add relevant details as my OS and architectures vary from the original problem. Hopefully the details of my builds will be helpful for further research.

I'm running Nomad/Consul across four Raspberry Pi's. The memory display is only working on three of them.

WORKING:

2x RPI 3 running Hypriot OS 1.11.1 - armhf kernel.

Linux DEVICE 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux
Docker: 18.06.3-ce
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

1x RPI 4 running Hypriot OS 1.12.3 - arm64 kernel (32-bit userland, i.e. docker)

Linux DEVICE 5.4.83-v8+ #1379 SMP PREEMPT Mon Dec 14 13:15:14 GMT 2020 aarch64 GNU/Linux
Docker: 20.10.2
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)

Hypriot is a modified Raspbian build optimized for docker workloads. I've made no modifications to the original Hypriot image other than adding nomad/consul and jobs.

NOT WORKING:

1x RPI 4 running Ubuntu 20.04 Focal ARM64. I manually installed Docker, docker-compose per docker.com and setup nomad/consul. Image was created from latest Raspberry Pi image builder tool to flash the card with Ubuntu 20.04 LTS ARM64 lite edition (i.e. did not build OS myself).

Linux DEVICE 5.4.0-1026-raspi #29-Ubuntu SMP PREEMPT Mon Dec 14 17:01:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
Docker: 20.10.2
Nomad v1.0.2 (4c1d4fc6a5823ebc8c3e748daec7b4fda3f11037)
sbrl commented 3 years ago

Hrm, that's very interesting that Ubuntu 20.04 doesn't appear to work, but Hypriot OS does. I wonder if Hypriot does something differently by default? It does say that it's tuned for Docker by default.

futuralogic commented 3 years ago

I thought it was weird as well. Based on the container.go in InspectContainerWithOptions() - I presume this is how nomad inspects a container - it seems to call the Docker Engine API to inspect. (Sorry, I am not familiar with Go or the codebase to authoritatively state this.)

If that is the case, docker inspect doesn't show any memory used.

docker stats also displays no memory in use:

ONTAINER ID   NAME                                                         CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O    PIDS
8adc893e5af2   mc-paper-server-arm64-e389b2dd-945b-6577-a461-df45cb42b7d0   42.06%    0B / 0B             0.00%     48.8MB / 352kB   0B / 303kB   48

It seems docker may be the culprit here?

Output of docker inspect:


ubuntu@host:/mnt/shared/nomad/jobs$ docker inspect 8adc
[
    {
        "Id": "8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101",
        "Created": "2021-01-18T02:10:30.703496654Z",
        "Path": "/runner/entrypoint",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 55003,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2021-01-18T02:10:32.16911133Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:1078eb7ec68613029733adaa5c89cb35868f3605fd787f466d8125c41a01c2c0",
        "ResolvConfPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/hostname",
        "HostsPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/hosts",
        "LogPath": "/var/lib/docker/containers/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101/8adc893e5af2782a6497af05ae239fa1c025aed77bda02d726d12328ec204101-json.log",
        "Name": "/mc-paper-server-arm64-e389b2dd-945b-6577-a461-df45cb42b7d0",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/alloc:/alloc",
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/local:/local",
                "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/secrets:/secrets",
                "/mnt/blah:/data"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "2",
                    "max-size": "2m"
                }
            },
            "NetworkMode": "default",
            "PortBindings": {
                "19132/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19132/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19133/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "19133/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "25565/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ],
                "25565/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ]
            },
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "CgroupnsMode": "host",
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "private",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 2000,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DeviceCgroupRules": null,
            "DeviceRequests": null,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": -1,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": null,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76-init/diff:/var/lib/docker/overlay2/6243e0bd492f6042f43edf4ad483bd7eb5045b9c15317fb471c2cf7df325cbd2/diff:/var/lib/docker/overlay2/f35c410ed29e3a07e4c449362895fadbee1cd84afdec9e0db2ee56f35e2493e1/diff:/var/lib/docker/overlay2/8e863c2778cf15b5c08f8aa1b5a3aa60a2dd5dbf182626ac2150994f79f94109/diff:/var/lib/docker/overlay2/7f97a7d374d260bbaedc954c56c5e449643650464297913063ef29f55bf5aaa6/diff:/var/lib/docker/overlay2/d3f17bbd14f3a136a07f518bc6b085c1e62193f579c2a0f143471642fdedfd4d/diff:/var/lib/docker/overlay2/fd40510034d439f01f86e400cd45947f81941a8fa192a1c807264ebfda34559c/diff",
                "MergedDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/merged",
                "UpperDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/diff",
                "WorkDir": "/var/lib/docker/overlay2/456353e1bfa4f5855fefc27986ee339220459d490ba9d3d2c904b8c0fcbebe76/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/alloc",
                "Destination": "/alloc",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/local",
                "Destination": "/local",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/storage/nomad/data/alloc/e389b2dd-945b-6577-a461-df45cb42b7d0/mc-paper-server-arm64/secrets",
                "Destination": "/secrets",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/mnt/blahblahblah",
                "Destination": "/data",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "8adc893e5af2",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "19132/tcp": {},
                "19132/udp": {},
                "19133/tcp": {},
                "19133/udp": {},
                "25565/tcp": {},
                "25565/udp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "JVM_OPTS=-Xms2048M -Xmx5120M",
                "NOMAD_ADDR_mc=192.168.1.191:25806",
                "NOMAD_ADDR_mc_udp_1=192.168.1.191:31938",
                "NOMAD_ADDR_mc_udp_2=192.168.1.191:25484",
                "NOMAD_ALLOC_DIR=/alloc",
                "NOMAD_ALLOC_ID=e389b2dd-945b-6577-a461-df45cb42b7d0",
                "NOMAD_ALLOC_INDEX=0",
                "NOMAD_ALLOC_NAME=minecraft-arm64.mc-server[0]",
                "NOMAD_ALLOC_PORT_mc-udp-1=19132",
                "NOMAD_ALLOC_PORT_mc-udp-2=19133",
                "NOMAD_ALLOC_PORT_mc=25565",
                "NOMAD_CPU_LIMIT=2000",
                "NOMAD_DC=futura",
                "NOMAD_GROUP_NAME=mc-server",
                "NOMAD_HOST_ADDR_mc-udp-1=192.168.1.191:31938",
                "NOMAD_HOST_ADDR_mc-udp-2=192.168.1.191:25484",
                "NOMAD_HOST_ADDR_mc=192.168.1.191:25806",
                "NOMAD_HOST_IP_mc-udp-1=192.168.1.191",
                "NOMAD_HOST_IP_mc-udp-2=192.168.1.191",
                "NOMAD_HOST_IP_mc=192.168.1.191",
                "NOMAD_HOST_PORT_mc=25806",
                "NOMAD_HOST_PORT_mc_udp_1=31938",
                "NOMAD_HOST_PORT_mc_udp_2=25484",
                "NOMAD_IP_mc=192.168.1.191",
                "NOMAD_IP_mc_udp_1=192.168.1.191",
                "NOMAD_IP_mc_udp_2=192.168.1.191",
                "NOMAD_JOB_ID=minecraft-arm64",
                "NOMAD_JOB_NAME=minecraft-arm64",
                "NOMAD_MEMORY_LIMIT=5700",
                "NOMAD_NAMESPACE=default",
                "NOMAD_PORT_mc=25565",
                "NOMAD_PORT_mc_udp_1=19132",
                "NOMAD_PORT_mc_udp_2=19133",
                "NOMAD_REGION=global",
                "NOMAD_SECRETS_DIR=/secrets",
                "NOMAD_TASK_DIR=/local",
                "NOMAD_TASK_NAME=mc-paper-server-arm64",
                "TZ=America/Chicago",
                "PATH=/opt/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "PRODUCT=paper"
            ],
            "Cmd": null,
            "Image": "jcxldn/minecraft-runner:paper-alpine",
            "Volumes": null,
            "WorkingDir": "/data",
            "Entrypoint": [
                "/runner/entrypoint"
            ],
            "OnBuild": null,
            "Labels": {
                "com.hashicorp.nomad.alloc_id": "e389b2dd-945b-6577-a461-df45cb42b7d0"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "8b70f263abc724d678b09d7f4f68acb25cf5f405e48564476dd0a7409d1c2945",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "19132/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19132/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "31938"
                    }
                ],
                "19133/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "19133/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25484"
                    }
                ],
                "25565/tcp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ],
                "25565/udp": [
                    {
                        "HostIp": "192.168.1.191",
                        "HostPort": "25806"
                    }
                ]
            },
            "SandboxKey": "/var/run/docker/netns/8b70f263abc7",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "9e463eae52177976954612f37310ad25026069f1e52cc054e18170c79cf6732c",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "eca62952ca3127291bf24f174e70b776c9e5e58199b2c8ec16a55b7fa7ea86fa",
                    "EndpointID": "9e463eae52177976954612f37310ad25026069f1e52cc054e18170c79cf6732c",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "",
                    "DriverOpts": null
                }
            }
        }
    }
]
tgross commented 3 years ago

Based on the container.go in InspectContainerWithOptions() - I presume this is how nomad inspects a container - it seems to call the Docker Engine API to inspect. (Sorry, I am not familiar with Go or the codebase to authoritatively state this.)

Yup, that's exactly how it's done for Docker containers. Do y'all see this same behavior with the exec or raw_exec driver? We use a different method there using the gopsutil library, and I know there's some operating-system dependent code in that path. If that works, it might be worth seeing if we could do some sort of fallback behavior in the docker driver, but it'll be tricky to get all the handles we'd need given that Docker owns the process. We can definitely open an issue with Docker upstream as well to see if they can fix the problem at their end.

sbrl commented 3 years ago

@tgross: I've tried with exec with this test noamd jobspec, but I get this error and all allocations fail:

failed to launch command with executor: rpc error: code = Unknown desc = container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: process_linux.go:422: setting cgroup config for procHooks process caused: cannot set memory limit: container could not join or create cgroup

I would imagine that any method used for exec / raw_exec should work.

Opening an issue upstream with Docker sounds like a good idea too, but I unsure if I'd be able to find the right place to report such an issue. I'm happy to test stuff though.

tgross commented 3 years ago

Oh right. If I recall from a previous issue, the Raspberry Pi distros don't have the cgroups you need enabled by default. See the exec driver docs section on Resource Isolation and add a cgroup_enable flag for the missing isolation to you boot command line and that should work fine.

sbrl commented 3 years ago

@tgross thanks for the reply. I see the problem now, but the link you've provided unfortunately does not contain any useful information about how to resolve it. If I understand it correctly, it tells me how I can check that cgroups are enabled, but not how to change the settings.

futuralogic commented 3 years ago

@tgross Thanks for the tip - that was the problem. Memory status is now updating. Thanks!

@sbrl I wasn't sure how to enable cgroups either, but the Resource Isolation documentation page linked to does mention "Some Linux distributions do not boot with all required cgroups enabled by default." I Googled cgroups on Ubuntu 20.04 (which is my distro in question), and it mentions modifying the cmdline.txt file used at boot.

It should be found somewhere under /boot I'd guess. I think on a typical Raspberry debian distro it's directly in the /boot dir but YMMV.

For Ubuntu 20.04 it's under /boot/firmware/cmdline.txt

Here was the output of the cmd from the Nomad docs that displays enabled cgroups:

awk '{print $1 " " $4}' /proc/cgroups
#subsys_name enabled
cpuset 1
cpu 1
cpuacct 1
blkio 1
memory 0
devices 1
freezer 1
net_cls 1
perf_event 1
net_prio 1
pids 1
rdma 1

As you can see "memory" was disabled.

I simply added cgroup_enable=memory to cmdline.txt and rebooted.

Full cmdline.txt for reference: net.ifnames=0 dwc_otg.lpm_enable=0 console=serial0,115200 console=tty1 root=LABEL=writable rootfstype=ext4 elevator=deadline rootwait fixrtc cgroup_enable=memory

I'd suggest this issue can be closed. Thanks for everyone's help and input.

tgross commented 3 years ago

Glad to hear that's working and thanks for the assist there @futuralogic!

3nprob commented 3 years ago

I'm actually experiencing this now, since recently (can't tell exacty whem, but some weeks ago?). Was always working fine before. Not sure if it's related to an OS system upgrade or a Nomad upgrade. So see it across all clients, though, Debian 11.

This is with the docker driver.

Oddly enough, the cgroups seem to be in order:

$ awk '{print $1 " " $4}' /proc/cgroups
#subsys_name enabled
cpuset 1
cpu 1
cpuacct 1
blkio 1
memory 1
devices 1
freezer 1
net_cls 1
perf_event 1
net_prio 1
hugetlb 1
pids 1
rdma 1

EDIT: Hold off, seems cgroup_enable=memory is missing from cmdline - not sure why it may suddently started requiring that but in either case, will see if that does fix it. The cgroup is already enabled though, as can be seen above.

# docker inspect $CONTAINER | grep -i 'cgroup'
            "CgroupnsMode": "private",
            "Cgroup": "",
            "CgroupParent": "",
            "DeviceCgroupRules": null,
sbrl commented 3 years ago

It's very delayed (sorry about that!)) but I've finally managed to find some time to try out the fix suggested above. On my Raspberry Pis, editing /boot/cmdline.txt and appending the following:

cgroup_enable=memory cgroup_memory=1

...fixed the issue. Note that it has to be on the 1st line there - no line breaks (\n) are allowed.

I implemented the following bash function that applies the fix automatically:

check_cgroups_memory() {
    echo ">>> Checking memory cgroups";

    cgroups_enabled="$(awk '/memory/ { print $2 }' < /proc/cgroups)";

    if [[ "${cgroups_enabled}" -ne 0 ]]; then
        echo ">>> memory cgroups already enabled";
        return 0;
    fi

    filepath_cmdline="/boot/cmdline.txt";
    if [[ ! -e "${filepath_cmdline}" ]]; then
        filepath_cmdline="/boot/firmware/cmdline.txt";
    fi
    if [[ ! -e "${filepath_cmdline}" ]]; then
        echo ">>> Failed to find cmdline.txt; can't check for cgroups";
        return 1;
    fi

    if grep -q cgroup_enable=memory /boot/cmdline.txt; then
        echo ">>> memory cgroups already present in cmdline.txt, a reboot is required to apply the update";
        return 0;
    fi

    echo ">>> memory cgroups not present in cmdline.txt, enabling....";
    (tr -d '\n' <"${filepath_cmdline}" && echo " cgroup_enable=memory cgroup_memory=1") | sudo tee "${filepath_cmdline}.new";

    sudo mv "${filepath_cmdline}" "${filepath_cmdline}.old-$(date +"%Y-%m-%d")";
    sudo mv "${filepath_cmdline}.new" "${filepath_cmdline}";

    echo ">>> New contents of cmdline.txt:";
    cat "${filepath_cmdline}";
    echo ">>> A reboot is required to apply the changes.";
}
AlekseyMelikov commented 2 years ago

I am having this issue now. I don’t remember when it appeared, but I remember that the problem was still in the versions:

Nomad v1.2.1 (719c53ac0ebee95d902faafe59a30422a091bc31)
Consul v1.10.4 Revision 7bbad6fe Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.11, build dea9396

I have now updated to

Nomad v1.2.6 (a6c6b475db5073e33885377b4a5c733e1161020c)
Consul v1.11.3 Revision e319d7ed Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Docker version 20.10.12, build e91ed57

Linux 5c24b868 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux

No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:    11
Codename:   bullseye

but the problem persists.

Description of the problem - Memory Usage of any Allocation is always 0 bytes 2

Host Resource Utilization is showing correctly 1

docker stats

CONTAINER ID   NAME            CPU %     MEM USAGE / LIMIT   MEM %     NET I/O   BLOCK I/O         PIDS
406f2[EDITED]   [EDITED]         0.03%     30.83MiB / 100MiB   30.83%    0B / 0B   4.1kB / 13.5MB    8
c667b[EDITED]   [EDITED]         0.00%     212KiB / 3.745GiB   0.01%     0B / 0B   0B / 0B           1
e46fe[EDITED]   [EDITED]         0.02%     40.43MiB / 200MiB   20.21%    0B / 0B   365kB / 369kB     8
47ae4[EDITED]   [EDITED]         0.00%     216KiB / 3.745GiB   0.01%     0B / 0B   0B / 0B           1
c8258[EDITED]   [EDITED]         0.03%     48.54MiB / 200MiB   24.27%    0B / 0B   0B / 8.19kB       11
22961[EDITED]   [EDITED]     0.01%     7.527MiB / 50MiB    15.05%    0B / 0B   750kB / 0B        2
21e1b[EDITED]   [EDITED]         0.28%     95.11MiB / 400MiB   23.78%    0B / 0B   58.5MB / 47MB     19
0f64b[EDITED]   [EDITED]         0.05%     58.89MiB / 100MiB   58.89%    0B / 0B   51.7MB / 2.09MB   18
caa34[EDITED]   [EDITED]         0.14%     42.91MiB / 100MiB   42.91%    0B / 0B   34.8MB / 0B       10
d13ea[EDITED]   [EDITED]     0.01%     10.52MiB / 50MiB    21.03%    0B / 0B   30.4MB / 0B       2
d3689[EDITED]   [EDITED]         1.87%     246.3MiB / 400MiB   61.58%    0B / 0B   33.7MB / 1.5MB    8
db532[EDITED]   [EDITED]         0.20%     129.3MiB / 600MiB   21.54%    0B / 0B   59MB / 57.1MB     31
60f28[EDITED]   [EDITED]         2.15%     12.92MiB / 100MiB   12.92%    0B / 0B   12.6MB / 6MB      5
e4914[EDITED]   [EDITED]         0.01%     16.39MiB / 50MiB    32.78%    0B / 0B   2.2MB / 69.6kB    7
1as2c[EDITED]   [EDITED]         0.38%     80.51MiB / 400MiB   20.13%    0B / 0B   54.7MB / 373kB    7
dd8bb[EDITED]   [EDITED]         0.12%     39.33MiB / 100MiB   39.33%    0B / 0B   29MB / 0B         12

cat /proc/cgroups

#subsys_name    hierarchy   num_cgroups enabled
cpuset  0   102 1
cpu 0   102 1
cpuacct 0   102 1
blkio   0   102 1
memory  0   102 1
devices 0   102 1
freezer 0   102 1
net_cls 0   102 1
perf_event  0   102 1
net_prio    0   102 1
hugetlb 0   102 1
pids    0   102 1
rdma    0   102 1
Nomad logs ``` Feb 13 08:26:58 host-name systemd[1]: Started Nomad. Feb 13 08:26:59 host-name nomad[697]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation. Feb 13 08:26:59 host-name nomad[697]: ==> Loaded configuration from /etc/nomad.d/client.hcl, /etc/nomad.d/nomad.hcl, /etc/nomad.d/server.hcl Feb 13 08:26:59 host-name nomad[697]: ==> Starting Nomad agent... Feb 13 08:27:00 host-name nomad[697]: ==> Nomad agent configuration: Feb 13 08:27:00 host-name nomad[697]: Advertise Addrs: HTTP: 172.16.0.2:4646; RPC: 172.16.0.2:4647; Serf: 172.16.0.2:4648 Feb 13 08:27:00 host-name nomad[697]: Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648 Feb 13 08:27:00 host-name nomad[697]: Client: true Feb 13 08:27:00 host-name nomad[697]: Log Level: INFO Feb 13 08:27:00 host-name nomad[697]: Region: global (DC: dc-name) Feb 13 08:27:00 host-name nomad[697]: Server: true Feb 13 08:27:00 host-name nomad[697]: Version: 1.2.6 Feb 13 08:27:00 host-name nomad[697]: ==> Nomad agent started! Log data will stream in below: Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.273Z [INFO] agent: detected plugin: name=qemu type=driver plugin_version=0.1.0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.273Z [INFO] agent: detected plugin: name=java type=driver plugin_version=0.1.0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.273Z [INFO] agent: detected plugin: name=docker type=driver plugin_version=0.1.0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.273Z [INFO] agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.273Z [INFO] agent: detected plugin: name=exec type=driver plugin_version=0.1.0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.331Z [INFO] nomad.raft: restored from snapshot: id=61-278615-1644597830311 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.443Z [INFO] nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:172.16.0.2:4647 Address:172.16.0.2:4647}]" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.443Z [INFO] nomad.raft: entering follower state: follower="Node at 172.16.0.2:4647 [Follower]" leader= Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.445Z [INFO] nomad: serf: EventMemberJoin: host-name.global 172.16.0.2 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.445Z [INFO] nomad: starting scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"] Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.446Z [INFO] nomad: serf: Attempting re-join to previously known node: dc-name-host-name.global: 172.16.0.2:4648 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.446Z [INFO] nomad: started scheduling worker(s): num_workers=2 schedulers=["service", "batch", "system", "sysbatch", "_core"] Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.447Z [INFO] nomad: serf: Re-joined to previously known node: dc-name-host-name.global: 172.16.0.2:4648 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.447Z [INFO] client: using state directory: state_dir=/opt/nomad/data/client Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.448Z [INFO] nomad: adding server: server="host-name.global (Addr: 172.16.0.2:4647) (DC: dc-name)" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.449Z [INFO] client: using alloc directory: alloc_dir=/opt/nomad/data/alloc Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.449Z [INFO] client: using dynamic ports: min=20000 max=32000 reserved="" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.449Z [WARN] client: could not initialize cpuset cgroup subsystem, cpuset management disabled: error="not implemented for cgroup v2 unified hierarchy" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.642Z [INFO] client.fingerprint_mgr.cgroup: cgroups are available Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.646Z [WARN] client.fingerprint_mgr.cpu: failed to detect set of reservable cores: error="not implemented for cgroup v2 unified hierarchy" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.693Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.695Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.699Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.705Z [WARN] client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.720Z [INFO] client.plugin: starting plugin manager: plugin-type=csi Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.720Z [INFO] client.plugin: starting plugin manager: plugin-type=driver Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:26:59.721Z [INFO] client.plugin: starting plugin manager: plugin-type=device Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.116Z [ERROR] client.driver_mgr.exec: failed to reattach to executor: driver=exec error="error creating rpc client for executor plugin: Reattachment process not found" task_id=2cdc7213-925b-7b29-8aa1-28f4ad0e03d2/[EDITED] Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.136Z [INFO] client: started client: node_id=f03bd130-5e77-6809-81f2-5470f161b8d5 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.138Z [INFO] client.gc: marking allocation for GC: alloc_id=e946c362-18e7-e330-f335-9cbe04ccc5ad Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=32054f2c-da70-2e80-fe9b-6d0ef865fd80 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=78ee19e2-302e-c65e-7dd7-221f911fc9fc Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=ced42f55-4d57-d0de-df46-278440049f0a Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=26d044c4-7388-f49e-4c93-0367b0783bf2 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=5ccc1846-fa40-52d6-4f6f-02d9baa0523b Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=9d3b36fa-7012-9569-d939-b2b102490570 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=a52e1bf7-2c51-a4f4-6bba-43668b8ad84a Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=bb17689f-cc33-e4c0-ef21-abe80f0dd0ac Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.141Z [INFO] client.gc: marking allocation for GC: alloc_id=0ddd1372-12e8-4c80-260f-b64c712bdee6 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.142Z [INFO] client.gc: marking allocation for GC: alloc_id=2cdc7213-925b-7b29-8aa1-28f4ad0e03d2 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.142Z [INFO] client.gc: marking allocation for GC: alloc_id=482d787f-af8a-17fe-74d7-2953d798768e Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.142Z [INFO] client.gc: marking allocation for GC: alloc_id=27e811a4-9990-8c47-76df-f3806f11bbaa Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.142Z [INFO] client.gc: marking allocation for GC: alloc_id=cd0083e7-adc0-cb28-f4b4-ad11fde6a550 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.142Z [INFO] client.gc: marking allocation for GC: alloc_id=9a57aff0-3441-8082-ece8-2ebc4b1ef382 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.987Z [WARN] nomad.raft: heartbeat timeout reached, starting election: last-leader= Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.987Z [INFO] nomad.raft: entering candidate state: node="Node at 172.16.0.2:4647 [Candidate]" term=62 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.992Z [INFO] nomad.raft: election won: tally=1 Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.992Z [INFO] nomad.raft: entering leader state: leader="Node at 172.16.0.2:4647 [Leader]" Feb 13 08:27:00 host-name nomad[697]: 2022-02-13T08:27:00.993Z [INFO] nomad: cluster leadership acquired Feb 13 08:27:01 host-name nomad[697]: 2022-02-13T08:27:01.079Z [INFO] client: node registration complete Feb 13 08:27:09 host-name nomad[697]: 2022-02-13T08:27:09.960Z [INFO] client: node registration complete Feb 13 08:27:14 host-name nomad[697]: 2022-02-13T08:27:14.655Z [INFO] client.fingerprint_mgr.consul: consul agent is available Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.224Z [INFO] agent: (runner) creating new runner (dry: false, once: false) Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.225Z [INFO] agent: (runner) creating watcher Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.228Z [INFO] agent: (runner) starting Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.231Z [INFO] agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED] Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.232Z [INFO] agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED] Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.233Z [INFO] agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/ea8246f8-ad62-8218-7424-ef2d2f765293[EDITED] Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.299Z [INFO] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=92de7609-8869-9feb-5613-1d8f1d818e59 task=oathkeeper @module=logmon path=/opt/nomad/data/alloc/92de7609-8869-9feb-5613-1d8f1d818e59/alloc/logs/[EDITED] Feb 13 08:27:18 host-name nomad[697]: 2022-02-13T08:27:18.304Z [INFO] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=92de7609-8869-9feb-5613-1d8f1d818e59 task=oathkeeper @module=logmon path=/opt/nomad/data/alloc/92de7609-8869-9feb-5613-1d8f1d818e59/alloc/logs/[EDITED] Feb 13 08:30:54 host-name nomad[697]: 2022-02-13T08:30:54.615Z [ERROR] http: request failed: method=GET path=/v1/client/allocation/undefined/stats error="alloc lookup failed: index error: UUID must be 36 characters" code=500 ```
Consul logs ``` Feb 13 08:26:58 host-name systemd[1]: Started "HashiCorp Consul - A service mesh solution". Feb 13 08:26:59 host-name consul[689]: ==> Starting Consul agent... Feb 13 08:26:59 host-name consul[689]: Version: '1.11.3' Feb 13 08:26:59 host-name consul[689]: Node ID: '64ad536f-4aca-61cf-a324-f98f0ed1677e' Feb 13 08:26:59 host-name consul[689]: Node name: 'host-name' Feb 13 08:26:59 host-name consul[689]: Datacenter: 'dc-name' (Segment: '') Feb 13 08:26:59 host-name consul[689]: Server: true (Bootstrap: true) Feb 13 08:26:59 host-name consul[689]: Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600) Feb 13 08:26:59 host-name consul[689]: Cluster Addr: 172.16.0.2 (LAN: 8301, WAN: 8302) Feb 13 08:26:59 host-name consul[689]: Encrypt: Gossip: true, TLS-Outgoing: true, TLS-Incoming: true, Auto-Encrypt-TLS: false Feb 13 08:26:59 host-name consul[689]: ==> Log data will now stream in as it occurs: Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.366Z [WARN] agent: BootstrapExpect is set to 1; this is the same as Bootstrap mode. Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.366Z [WARN] agent: bootstrap = true: do not enable unless necessary Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.417Z [WARN] agent.auto_config: BootstrapExpect is set to 1; this is the same as Bootstrap mode. Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.417Z [WARN] agent.auto_config: bootstrap = true: do not enable unless necessary Feb 13 08:26:59 host-name consul[689]: 2022-02-13T08:26:59.442Z [INFO] agent.server.raft: restored from snapshot: id=48-1409193-1644602007773 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.011Z [INFO] agent.server.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:64ad536f-4aca-61cf-a324-f98f0ed1677e Address:172.16.0.2:8300}]" Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.011Z [INFO] agent.server.raft: entering follower state: follower="Node at 172.16.0.2:8300 [Follower]" leader= Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.013Z [INFO] agent.server.serf.wan: serf: EventMemberJoin: host-name.dc-name 172.16.0.2 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.014Z [INFO] agent.server.serf.wan: serf: Attempting re-join to previously known node: dc-name-host-name.dc-name: 172.16.0.2:8302 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.015Z [INFO] agent.server.serf.wan: serf: Re-joined to previously known node: dc-name-host-name.dc-name: 172.16.0.2:8302 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: host-name 172.16.0.2 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO] agent.router: Initializing LAN area manager Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.018Z [INFO] agent.server.serf.lan: serf: Attempting re-join to previously known node: dc-name-host-name: 172.16.0.2:8301 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.019Z [INFO] agent.server.serf.lan: serf: Re-joined to previously known node: dc-name-host-name: 172.16.0.2:8301 Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.020Z [INFO] agent.server: Adding LAN server: server="host-name (Addr: tcp/172.16.0.2:8300) (DC: dc-name)" Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.020Z [INFO] agent.server: Handled event for server in area: event=member-join server=host-name.dc-name area=wan Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.021Z [WARN] agent: grpc: addrConn.createTransport failed to connect to {dc-name-172.16.0.2:8300 0 host-name }. Err :connection error: desc = "transport: Error while dialing dial tcp ->172.16.0.2:8300: operation was canceled". Reconnecting... Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.035Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=tcp Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.036Z [INFO] agent: Started DNS server: address=0.0.0.0:8600 network=udp Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.036Z [INFO] agent: Starting server: address=[::]:8500 network=tcp protocol=http Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.043Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them. Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.044Z [INFO] agent: Started gRPC server: address=[::]:8502 network=tcp Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.045Z [INFO] agent: started state syncer Feb 13 08:27:00 host-name consul[689]: 2022-02-13T08:27:00.045Z [INFO] agent: Consul agent running! Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.232Z [WARN] agent.server.raft: heartbeat timeout reached, starting election: last-leader= Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.232Z [INFO] agent.server.raft: entering candidate state: node="Node at 172.16.0.2:8300 [Candidate]" term=50 Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO] agent.server.raft: election won: tally=1 Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO] agent.server.raft: entering leader state: leader="Node at 172.16.0.2:8300 [Leader]" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.238Z [INFO] agent.server: cluster leadership acquired Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.239Z [INFO] agent.server: New leader elected: payload=host-name Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.722Z [INFO] agent: Synced node info Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO] agent.server: initializing acls Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO] agent.leader: started routine: routine="legacy ACL token upgrade" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.723Z [INFO] agent.leader: started routine: routine="acl token reaping" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.726Z [INFO] agent.leader: started routine: routine="federation state anti-entropy" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.726Z [INFO] agent.leader: started routine: routine="federation state pruning" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.leader: started routine: routine="intermediate cert renew watch" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.leader: started routine: routine="CA root pruning" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.leader: started routine: routine="CA root expiration metric" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.leader: started routine: routine="CA signing expiration metric" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.leader: started routine: routine="virtual IP version check" Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.727Z [INFO] agent.server: deregistering member: member=c807ea31 partition=default reason=reaped Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.739Z [INFO] agent: Deregistered service: service=_nomad-task-32054f2c-da70-2e80-fe9b-6d0ef865fd80-group-prometheus-prometheus-prometheus Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.740Z [INFO] agent: Deregistered service: service=_nomad-task-5ccc1846-fa40-52d6-4f6f-02d9baa0523b-group-envoy-envoy- Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.741Z [INFO] agent: Synced check: check=_nomad-check-ed7b6ce5bc6c5af7ca61be80e1c836df45b44455 Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.743Z [INFO] agent: Synced check: check=_nomad-check-689ac211032c7cf7fd8003fb2c299ee11fd17a58 Feb 13 08:27:01 host-name consul[689]: 2022-02-13T08:27:01.745Z [INFO] agent: Synced check: check=_nomad-check-b305d921ca9bcc3f684e0294bcc862256703ff60 Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.017Z [INFO] agent: Synced check: check=_nomad-check-689ac211032c7cf7fd8003fb2c299ee11fd17a58 Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.397Z [INFO] agent: Synced check: check=_nomad-check-b305d921ca9bcc3f684e0294bcc862256703ff60 Feb 13 08:27:05 host-name consul[689]: 2022-02-13T08:27:05.399Z [INFO] agent: Synced check: check=_nomad-check-ed7b6ce5bc6c5af7ca61be80e1c836df45b44455 Feb 13 08:27:28 host-name consul[689]: 2022-02-13T08:27:28.952Z [INFO] agent.server.serf.lan: serf: EventMemberJoin: c807ea31 172.16.0.3 Feb 13 08:27:28 host-name consul[689]: 2022-02-13T08:27:28.952Z [INFO] agent.server: member joined, marking health alive: member=c807ea31 partition=default Feb 13 08:27:34 host-name consul[689]: 2022-02-13T08:27:34.281Z [ERROR] agent.dns: recurse failed: error="read udp 116.203.25.69:36315->1.1.1.1:53: i/o timeout" Feb 13 08:27:34 host-name consul[689]: 2022-02-13T08:27:34.291Z [ERROR] agent.dns: recurse failed: error="read udp 116.203.25.69:37048->1.1.1.1:53: i/o timeout" ```
sbrl commented 2 years ago

@AlekseyMelikov best to open a new issue. This issue is specifically to do with Raspberry Pis / ARM devices. You're using x86_64 there, so while the symptoms are the same the solution is likely to be very different.

github-actions[bot] commented 2 years ago

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.