hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

raw_exec not work on openwrt - function not implemented #20328

Closed Great-Stone closed 4 months ago

Great-Stone commented 5 months ago

Nomad version

Nomad v1.7.7-dev BuildDate 2024-04-02T19:24:51Z Revision cf25cf5cd5dbade28a38fc41398fc8b584f3b643

Operating system and Environment details

Alpine Linux for 5G router

Issue

The Nomad runs fine, but the raw_exec runs fail. Since there are no binaries for Alpine Linux, I built this and tested its execution. The 5G router equipment is running a lighter version of Linux, albeit based on Alpine Linux. The machine has very little storage (512MB), so building on it was not possible. I installed Alpine Linux arm64 with UTM on my MacBook (M2) and proceeded to build from there. Here is the procedure for the build (I referenced the build from the nomad-enterprise git repo).

Reproduction steps

The command we built looks like this

sed -i '/^#http/s/^#//' /etc/apk/repositories

apk update && apk add --no-cache bash build-base curl git linux-headers g++ make musl-dev nodejs npm openssh zip binutils binutils-gold

npm install -g yarn

mkdir -p /build/gopath
cd /build

wget https://go.dev/dl/go1.22.1.linux-arm64.tar.gz

tar zxvf go1.22.1.linux-arm64.tar.gz

export GOROOT=/build/go
export PATH=$PATH:/build/go/bin:/build/gopath/bin

export GOPATH=/build/gopath
mkdir -p $GOPATH/src/github.com/hashicorp && cd $_

git clone https://github.com/hashicorp/nomad.git

cd nomad

make bootstrap

make dev

Expected Result

On Alpine Linux on MacOS M2, where we ran the build, both Nomad execution and raw_exec Job execution are fine. However, on Linux on the 5G router device, running raw_exec causes an error. I would like to know if I am missing something essentially required for that Linux.

When I browse the logs for related errors, I see a lot of cgroups, so I update the related list first.

cgroups dir

/sys/fs/cgroup# ls
blkio.reset_stats                   cpuacct.usage                       cpuset.memory_spread_slab           memory.kmem.tcp.failcnt             memory.swappiness
cgroup.clone_children               cpuacct.usage_percpu                cpuset.mems                         memory.kmem.tcp.limit_in_bytes      memory.usage_in_bytes
cgroup.event_control                cpuset.cpu_exclusive                cpuset.sched_load_balance           memory.kmem.tcp.max_usage_in_bytes  memory.use_hierarchy
cgroup.procs                        cpuset.cpus                         cpuset.sched_relax_domain_level     memory.kmem.tcp.usage_in_bytes      net_cls.classid
cgroup.sane_behavior                cpuset.effective_cpus               devices.allow                       memory.kmem.usage_in_bytes          nomad.slice
cpu.cfs_period_us                   cpuset.effective_mems               devices.deny                        memory.limit_in_bytes               notify_on_release
cpu.cfs_quota_us                    cpuset.mem_exclusive                devices.list                        memory.max_usage_in_bytes           release_agent
cpu.rt_period_us                    cpuset.mem_hardwall                 memory.failcnt                      memory.move_charge_at_immigrate     tasks
cpu.rt_runtime_us                   cpuset.memory_migrate               memory.force_empty                  memory.oom_control
cpu.shares                          cpuset.memory_pressure              memory.kmem.failcnt                 memory.pressure_level
cpu.stat                            cpuset.memory_pressure_enabled      memory.kmem.limit_in_bytes          memory.soft_limit_in_bytes
cpuacct.stat                        cpuset.memory_spread_page           memory.kmem.max_usage_in_bytes      memory.stat

Actual Result

image

Job file (if appropriate)

job "periodic" {
  datacenters = ["dc1"]

  type        = "batch"
  periodic {
    crons            = [
      "*/5 * * * * * *"
    ]
    prohibit_overlap = true
  }
  group "batch" {
    count = 1
    task "batch" {
      driver = "raw_exec"
      template {
        data = <<EOF
#!/bin/ash
echo $(date) >> /tmp/periodic.txt
EOF
        destination = "run.sh"
      }
      config {
        command = "run.sh"
      }
      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Running Log

./nomad agent -server -dev -bind=0.0.0.0

==> No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:

       Advertise Addrs: HTTP: 192.168.1.1:4646; RPC: 192.168.1.1:4647; Serf: 192.168.1.1:4648
            Bind Addrs: HTTP: [0.0.0.0:4646]; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
                Client: true
             Log Level: DEBUG
               Node Id: 5636c803-920d-3124-7335-285fecad93b7
                Region: global (DC: dc1)
                Server: true
               Version: 1.7.7-dev

==> Nomad agent started! Log data will stream in below:

    2024-04-04T07:19:04.955Z [DEBUG] nomad: issuer not set; OIDC Discovery endpoint for workload identities disabled
    2024-04-04T07:19:04.965Z [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:fbec688e-3bc5-4dab-dee7-f7563c9c79f3 Address:192.168.1.1:4647}]"
    2024-04-04T07:19:04.965Z [INFO]  nomad.raft: entering follower state: follower="Node at 192.168.1.1:4647 [Follower]" leader-address= leader-id=
    2024-04-04T07:19:04.967Z [INFO]  nomad: serf: EventMemberJoin: 5Gax.global 192.168.1.1
    2024-04-04T07:19:04.967Z [INFO]  nomad: starting scheduling worker(s): num_workers=4 schedulers=["service", "batch", "system", "sysbatch", "_core"]
    2024-04-04T07:19:04.967Z [DEBUG] nomad: started scheduling worker: id=2dfea732-5257-8386-0ba2-e86dfcc840f2 index=1 of=4
    2024-04-04T07:19:04.967Z [DEBUG] nomad: started scheduling worker: id=f30581dd-4e09-8789-b6fb-625b3ef36cd1 index=2 of=4
    2024-04-04T07:19:04.967Z [DEBUG] worker: running: worker_id=2dfea732-5257-8386-0ba2-e86dfcc840f2
    2024-04-04T07:19:04.967Z [DEBUG] nomad: started scheduling worker: id=72351a53-e382-e832-021c-196a38ad5198 index=3 of=4
    2024-04-04T07:19:04.967Z [DEBUG] nomad: started scheduling worker: id=ae53a6b2-8803-a847-3b68-23c78113073a index=4 of=4
    2024-04-04T07:19:04.967Z [INFO]  nomad: started scheduling worker(s): num_workers=4 schedulers=["service", "batch", "system", "sysbatch", "_core"]
    2024-04-04T07:19:04.968Z [DEBUG] worker: running: worker_id=f30581dd-4e09-8789-b6fb-625b3ef36cd1
    2024-04-04T07:19:04.968Z [DEBUG] worker: running: worker_id=72351a53-e382-e832-021c-196a38ad5198
    2024-04-04T07:19:04.968Z [DEBUG] worker: running: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a
    2024-04-04T07:19:04.968Z [INFO]  nomad: adding server: server="5Gax.global (Addr: 192.168.1.1:4647) (DC: dc1)"
    2024-04-04T07:19:04.969Z [DEBUG] nomad.keyring.replicator: starting encryption key replication
    2024-04-04T07:19:04.973Z [DEBUG] agent.plugin_loader.docker: using client connection initialized from environment: plugin_dir=""
    2024-04-04T07:19:04.977Z [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.977Z [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.978Z [INFO]  agent: detected plugin: name=mock_driver type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.978Z [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.978Z [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.978Z [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
    2024-04-04T07:19:04.978Z [INFO]  client: using state directory: state_dir=/tmp/NomadClient55285145
    2024-04-04T07:19:04.978Z [INFO]  client: using alloc directory: alloc_dir=/tmp/NomadClient954728857
    2024-04-04T07:19:04.978Z [INFO]  client: using dynamic ports: min=20000 max=32000 reserved=""
    2024-04-04T07:19:04.983Z [DEBUG] client.fingerprint_mgr: built-in fingerprints: fingerprinters=["arch", "bridge", "cgroup", "cni", "consul", "cpu", "host", "landlock", "memory", "network", "nomad", "plugins_cni", "signal", "storage", "vault", "env_aws", "env_gce", "env_azure", "env_digitalocean"]
    2024-04-04T07:19:04.987Z [WARN]  client.fingerprint_mgr: failed to detect bridge kernel module, bridge network mode disabled:
  error=
  | 4 errors occurred:
  | \t* failed to find /sys/module/bridge: stat /sys/module/bridge: no such file or directory
  | \t* module bridge not in /proc/modules
  | \t* failed to open /lib/modules/4.4.60/modules.builtin: open /lib/modules/4.4.60/modules.builtin: no such file or directory
  | \t* failed to open /lib/modules/4.4.60/modules.dep: open /lib/modules/4.4.60/modules.dep: no such file or directory
  |

    2024-04-04T07:19:04.988Z [DEBUG] client.fingerprint_mgr: CNI config dir is not set or does not exist, skipping: cni_config_dir=/opt/cni/config
    2024-04-04T07:19:04.992Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=consul initial_period=15s
    2024-04-04T07:19:04.994Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU frequency: mhz=2208
    2024-04-04T07:19:04.994Z [DEBUG] client.fingerprint_mgr.cpu: detected CPU core count: cores=4
    2024-04-04T07:19:05.125Z [WARN]  client.fingerprint_mgr.landlock: failed to fingerprint kernel landlock feature: error="function not implemented"
    2024-04-04T07:19:05.258Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-04-04T07:19:05.258Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/lo/speed device=lo
    2024-04-04T07:19:05.258Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected and no speed specified by user, falling back to default speed: interface=lo mbits=1000
    2024-04-04T07:19:05.259Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo IP=127.0.0.1
    2024-04-04T07:19:05.259Z [DEBUG] client.fingerprint_mgr.network: detected interface IP: interface=lo IP=::1
    2024-04-04T07:19:05.402Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
    2024-04-04T07:19:05.402Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/lo/speed device=lo
    2024-04-04T07:19:05.402Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=lo mbits=1000
    2024-04-04T07:19:05.533Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=miireg
    2024-04-04T07:19:05.533Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/miireg/speed device=miireg
    2024-04-04T07:19:05.533Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=miireg mbits=1000
    2024-04-04T07:19:05.803Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ip6tnl0
    2024-04-04T07:19:05.804Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ip6tnl0/speed device=ip6tnl0
    2024-04-04T07:19:05.804Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ip6tnl0 mbits=1000
    2024-04-04T07:19:05.805Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=sit0
    2024-04-04T07:19:05.805Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/sit0/speed device=sit0
    2024-04-04T07:19:05.805Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=sit0 mbits=1000
    2024-04-04T07:19:05.807Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ifb0
    2024-04-04T07:19:05.807Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ifb0/speed device=ifb0
    2024-04-04T07:19:05.807Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ifb0 mbits=1000
    2024-04-04T07:19:05.809Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ifb1
    2024-04-04T07:19:05.809Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ifb1/speed device=ifb1
    2024-04-04T07:19:05.809Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ifb1 mbits=1000
    2024-04-04T07:19:05.811Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=gre0
    2024-04-04T07:19:05.811Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/gre0/speed device=gre0
    2024-04-04T07:19:05.811Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=gre0 mbits=1000
    2024-04-04T07:19:05.813Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=gretap0
    2024-04-04T07:19:05.813Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/gretap0/speed device=gretap0
    2024-04-04T07:19:05.813Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=gretap0 mbits=1000
    2024-04-04T07:19:05.815Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ip6gre0
    2024-04-04T07:19:05.815Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ip6gre0/speed device=ip6gre0
    2024-04-04T07:19:05.815Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ip6gre0 mbits=1000
    2024-04-04T07:19:05.817Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=bond0
    2024-04-04T07:19:05.817Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/bond0/speed device=bond0
    2024-04-04T07:19:05.817Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=bond0 mbits=1000
    2024-04-04T07:19:05.819Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ipsecdummy
    2024-04-04T07:19:05.819Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ipsecdummy/speed device=ipsecdummy
    2024-04-04T07:19:05.819Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ipsecdummy mbits=1000
    2024-04-04T07:19:05.821Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=teql0
    2024-04-04T07:19:05.822Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/teql0/speed device=teql0
    2024-04-04T07:19:05.822Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=teql0 mbits=1000
    2024-04-04T07:19:05.823Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wifi0
    2024-04-04T07:19:05.824Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/wifi0/speed device=wifi0
    2024-04-04T07:19:05.824Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=wifi0 mbits=1000
    2024-04-04T07:19:05.826Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=wifi1
    2024-04-04T07:19:05.826Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/wifi1/speed device=wifi1
    2024-04-04T07:19:05.826Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=wifi1 mbits=1000
    2024-04-04T07:19:05.828Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=soc0
    2024-04-04T07:19:05.828Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/soc0/speed device=soc0
    2024-04-04T07:19:05.828Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=soc0 mbits=1000
    2024-04-04T07:19:05.830Z [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=br-lan
    2024-04-04T07:19:05.830Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/br-lan/speed device=br-lan
    2024-04-04T07:19:05.830Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=br-lan mbits=1000
    2024-04-04T07:19:05.832Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ath0
    2024-04-04T07:19:05.832Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ath0/speed device=ath0
    2024-04-04T07:19:05.832Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ath0 mbits=1000
    2024-04-04T07:19:05.833Z [WARN]  client.fingerprint_mgr.network: error calling ethtool: error="exit status 75" path=/usr/sbin/ethtool device=ath1
    2024-04-04T07:19:05.834Z [DEBUG] client.fingerprint_mgr.network: unable to read link speed: path=/sys/class/net/ath1/speed device=ath1
    2024-04-04T07:19:05.834Z [DEBUG] client.fingerprint_mgr.network: link speed could not be detected, falling back to default speed: interface=ath1 mbits=1000
    2024-04-04T07:19:05.834Z [WARN]  client.fingerprint_mgr.cni_plugins: failed to read CNI plugins directory: cni_path=/opt/cni/bin error="open /opt/cni/bin: no such file or directory"
    2024-04-04T07:19:05.837Z [DEBUG] client.fingerprint_mgr: fingerprinting periodically: fingerprinter=vault initial_period=15s
    2024-04-04T07:19:05.838Z [DEBUG] client.fingerprint_mgr.env_azure: could not read value for attribute: attribute=compute/azEnvironment error="Get \"http://169.254.169.254/metadata/instance/compute/azEnvironment?api-version=2019-06-04&format=text\": dial tcp 169.254.169.254:80: connect: network is unreachable"
    2024-04-04T07:19:05.838Z [DEBUG] client.fingerprint_mgr.env_digitalocean: failed to request metadata: attribute=region error="Get \"http://169.254.169.254/metadata/v1/region\": dial tcp 169.254.169.254:80: connect: network is unreachable"
    2024-04-04T07:19:05.839Z [DEBUG] client.fingerprint_mgr.env_gce: could not read value for attribute: attribute=machine-type error="Get \"http://169.254.169.254/computeMetadata/v1/instance/machine-type\": dial tcp 169.254.169.254:80: connect: network is unreachable"
    2024-04-04T07:19:05.839Z [DEBUG] client.fingerprint_mgr.env_gce: error querying GCE Metadata URL, skipping
    2024-04-04T07:19:05.839Z [DEBUG] client.fingerprint_mgr: detected fingerprints: node_attrs=["arch", "cpu", "host", "network", "nomad", "signal", "storage"]
    2024-04-04T07:19:05.840Z [INFO]  client.proclib.cg2: initializing nomad cgroups: cores=0-3
    2024-04-04T07:19:05.840Z [INFO]  client.plugin: starting plugin manager: plugin-type=csi
    2024-04-04T07:19:05.840Z [INFO]  client.plugin: starting plugin manager: plugin-type=driver
    2024-04-04T07:19:05.840Z [INFO]  client.plugin: starting plugin manager: plugin-type=device
    2024-04-04T07:19:05.840Z [DEBUG] client.device_mgr: exiting since there are no device plugins
    2024-04-04T07:19:05.840Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=java health=unhealthy description="Failed to discover cgroup mount point"
    2024-04-04T07:19:05.840Z [DEBUG] client.driver_mgr.docker: using client connection initialized from environment: driver=docker
    2024-04-04T07:19:05.841Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=driver
    2024-04-04T07:19:05.841Z [DEBUG] client.plugin: waiting on plugin manager initial fingerprint: plugin-type=device
    2024-04-04T07:19:05.841Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=device
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=mock_driver health=healthy description=Healthy
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=qemu health=undetected description=""
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=exec health=unhealthy description="Failed to discover cgroup mount point"
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=raw_exec health=healthy description=Healthy
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr.docker: could not connect to docker daemon: driver=docker endpoint=unix:///var/run/docker.sock error="Get \"http://unix.sock/version\": dial unix /var/run/docker.sock: connect: no such file or directory"
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: initial driver fingerprint: driver=docker health=undetected description="Failed to connect to docker daemon"
    2024-04-04T07:19:05.841Z [DEBUG] client.driver_mgr: detected drivers: drivers="map[healthy:[mock_driver raw_exec] undetected:[qemu docker] unhealthy:[java exec]]"
    2024-04-04T07:19:05.842Z [DEBUG] client.plugin: finished plugin manager initial fingerprint: plugin-type=driver
    2024-04-04T07:19:05.842Z [ERROR] client.driver_mgr.docker: failed to list pause containers for recovery: driver=docker error="Get \"http://unix.sock/containers/json?filters=%7B%22label%22%3A%5B%22com.hashicorp.nomad.alloc_id%22%5D%7D\": dial unix /var/run/docker.sock: connect: no such file or directory"
    2024-04-04T07:19:05.848Z [DEBUG] client.server_mgr: new server list: new_servers=[0.0.0.0:4647, 192.168.1.1:4647] old_servers=[]
    2024-04-04T07:19:05.848Z [INFO]  client: started client: node_id=b60b6b3f-9643-ca6c-f9c2-e375f1afaede
    2024-04-04T07:19:05.849Z [DEBUG] http: UI is enabled
    2024-04-04T07:19:05.849Z [DEBUG] http: UI is enabled
    2024-04-04T07:19:06.136Z [DEBUG] http: request failed: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b error="Unknown node 1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b" code=404
    2024-04-04T07:19:06.136Z [DEBUG] http: request complete: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b duration=1.099584ms
    2024-04-04T07:19:06.819Z [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
    2024-04-04T07:19:06.819Z [INFO]  nomad.raft: entering candidate state: node="Node at 192.168.1.1:4647 [Candidate]" term=2
    2024-04-04T07:19:06.819Z [DEBUG] nomad.raft: voting for self: term=2 id=fbec688e-3bc5-4dab-dee7-f7563c9c79f3
    2024-04-04T07:19:06.820Z [DEBUG] nomad.raft: calculated votes needed: needed=1 term=2
    2024-04-04T07:19:06.820Z [DEBUG] nomad.raft: vote granted: from=fbec688e-3bc5-4dab-dee7-f7563c9c79f3 term=2 tally=1
    2024-04-04T07:19:06.820Z [INFO]  nomad.raft: election won: term=2 tally=1
    2024-04-04T07:19:06.820Z [INFO]  nomad.raft: entering leader state: leader="Node at 192.168.1.1:4647 [Leader]"
    2024-04-04T07:19:06.820Z [INFO]  nomad: cluster leadership acquired
    2024-04-04T07:19:06.824Z [DEBUG] nomad.autopilot: autopilot is now running
    2024-04-04T07:19:06.825Z [DEBUG] nomad.autopilot: state update routine is now running
    2024-04-04T07:19:06.826Z [INFO]  nomad.core: established cluster id: cluster_id=3c21289c-fe5c-7fa0-354f-70e17c21ccd0 create_time=1712215146825712254
    2024-04-04T07:19:06.826Z [INFO]  nomad: eval broker status modified: paused=false
    2024-04-04T07:19:06.826Z [INFO]  nomad: blocked evals status modified: paused=false
    2024-04-04T07:19:06.921Z [DEBUG] client.server_mgr: new server list: new_servers=[192.168.1.1:4647] old_servers=[192.168.1.1:4647, 0.0.0.0:4647]
    2024-04-04T07:19:06.921Z [INFO]  client: node registration complete
    2024-04-04T07:19:06.924Z [DEBUG] client: state updated: node_status=ready
    2024-04-04T07:19:06.924Z [DEBUG] client: updated allocations: index=1 total=0 pulled=0 filtered=0
    2024-04-04T07:19:06.924Z [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=0
    2024-04-04T07:19:06.924Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=0 errors=0
    2024-04-04T07:19:07.605Z [INFO]  nomad.keyring: initialized keyring: id=e33189fb-0538-ad41-02cb-fc55ca63a110
    2024-04-04T07:19:07.922Z [DEBUG] client: state changed, updating node and re-registering
    2024-04-04T07:19:07.925Z [INFO]  client: node registration complete
    2024-04-04T07:19:10.134Z [DEBUG] http: request failed: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b error="Unknown node 1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b" code=404
    2024-04-04T07:19:10.135Z [DEBUG] http: request complete: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b duration="703.854µs"
    2024-04-04T07:19:14.133Z [DEBUG] http: request failed: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b error="Unknown node 1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b" code=404
    2024-04-04T07:19:14.133Z [DEBUG] http: request complete: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b duration=1.224948ms
==> Failed to check for updates: Get "https://checkpoint-api.hashicorp.com/v1/check/nomad?arch=arm64&os=linux&signature=&version=1.7.7-dev": dial tcp: lookup checkpoint-api.hashicorp.com on 127.0.0.1:53: server misbehaving
    2024-04-04T07:19:18.135Z [DEBUG] http: request failed: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b error="Unknown node 1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b" code=404
    2024-04-04T07:19:18.135Z [DEBUG] http: request complete: method=GET path=/v1/client/stats?node_id=1bff44ec-d3c5-4aaa-5ac4-be40ccae4f6b duration=1.210625ms

Error Log

    2024-04-04T07:20:30.044Z [DEBUG] worker: created evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval="<Eval \"46610e40-eeb5-f067-04e2-7ff92b9a7413\" JobID: \"periodic/periodic-1712215229\" Namespace: \"default\">" waitUntil="\"2024-04-04 07:20:34.957339202 +0000 UTC\""
    2024-04-04T07:20:30.045Z [DEBUG] worker.batch_sched: found reschedulable allocs, followup eval created: eval_id=52d88b6a-3554-ae35-b8a9-92daa84f080b job_id=periodic/periodic-1712215229 namespace=default worker_id=ae53a6b2-8803-a847-3b68-23c78113073a followup_eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413
    2024-04-04T07:20:30.049Z [DEBUG] worker: submitted plan for evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval_id=52d88b6a-3554-ae35-b8a9-92daa84f080b
    2024-04-04T07:20:30.049Z [DEBUG] worker.batch_sched: setting eval status: eval_id=52d88b6a-3554-ae35-b8a9-92daa84f080b job_id=periodic/periodic-1712215229 namespace=default worker_id=ae53a6b2-8803-a847-3b68-23c78113073a status=complete
    2024-04-04T07:20:30.050Z [DEBUG] worker: updated evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval="<Eval \"52d88b6a-3554-ae35-b8a9-92daa84f080b\" JobID: \"periodic/periodic-1712215229\" Namespace: \"default\">"
    2024-04-04T07:20:30.051Z [DEBUG] worker: ack evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval_id=52d88b6a-3554-ae35-b8a9-92daa84f080b type=batch namespace=default job_id=periodic/periodic-1712215229 node_id="" triggered_by=alloc-failure
    2024-04-04T07:20:30.053Z [DEBUG] client: updated allocations: index=19 total=1 pulled=1 filtered=0
    2024-04-04T07:20:30.054Z [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=0
    2024-04-04T07:20:30.054Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=0 errors=0
    2024-04-04T07:20:30.101Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/summary?index=12 duration="751.563µs"
    2024-04-04T07:20:30.103Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/allocations?index=14 duration=2.884687ms
    2024-04-04T07:20:30.103Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/evaluations?index=13 duration=2.843906ms
    2024-04-04T07:20:31.000Z [DEBUG] nomad.periodic: skipping launch of periodic job because job prohibits overlap: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:31.000Z [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=999.185955ms job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:32.000Z [DEBUG] nomad.periodic: skipping launch of periodic job because job prohibits overlap: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:32.001Z [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=998.994966ms job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:33.001Z [DEBUG] nomad.periodic: skipping launch of periodic job because job prohibits overlap: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:33.001Z [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=998.496372ms job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:34.001Z [DEBUG] nomad.periodic: skipping launch of periodic job because job prohibits overlap: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:34.001Z [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=998.617675ms job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:34.958Z [DEBUG] worker: dequeued evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413 type=batch namespace=default job_id=periodic/periodic-1712215229 node_id="" triggered_by=alloc-failure
    2024-04-04T07:20:34.958Z [DEBUG] worker.batch_sched: reconciled current state with desired state: eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413 job_id=periodic/periodic-1712215229 namespace=default worker_id=ae53a6b2-8803-a847-3b68-23c78113073a
  results=
  | Total changes: (place 1) (destructive 0) (inplace 0) (stop 1) (disconnect 0) (reconnect 0)
  | Desired Changes for "batch": (place 1) (inplace 0) (destructive 0) (stop 1) (migrate 0) (ignore 0) (canary 0)

    2024-04-04T07:20:34.988Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/summary?index=17 duration=2.886800103s
    2024-04-04T07:20:34.989Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/evaluations?index=20 duration=2.887341561s
    2024-04-04T07:20:34.989Z [DEBUG] worker: submitted plan for evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413
    2024-04-04T07:20:34.989Z [DEBUG] worker.batch_sched: setting eval status: eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413 job_id=periodic/periodic-1712215229 namespace=default worker_id=ae53a6b2-8803-a847-3b68-23c78113073a status=complete
    2024-04-04T07:20:34.991Z [DEBUG] worker: updated evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval="<Eval \"46610e40-eeb5-f067-04e2-7ff92b9a7413\" JobID: \"periodic/periodic-1712215229\" Namespace: \"default\">"
    2024-04-04T07:20:34.988Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/allocations?index=19 duration=2.88671052s
    2024-04-04T07:20:34.991Z [DEBUG] worker: ack evaluation: worker_id=ae53a6b2-8803-a847-3b68-23c78113073a eval_id=46610e40-eeb5-f067-04e2-7ff92b9a7413 type=batch namespace=default job_id=periodic/periodic-1712215229 node_id="" triggered_by=alloc-failure
    2024-04-04T07:20:34.992Z [DEBUG] client: updated allocations: index=21 total=2 pulled=2 filtered=0
    2024-04-04T07:20:34.992Z [DEBUG] client: allocation updates: added=1 removed=0 updated=1 ignored=0
    2024-04-04T07:20:34.994Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch type=Received msg="Task received by client" failed=false
    2024-04-04T07:20:34.994Z [DEBUG] client: allocation updates applied: added=1 removed=0 updated=1 ignored=0 errors=0
    2024-04-04T07:20:34.995Z [DEBUG] client.alloc_runner.widmgr: retrieving and renewing workload identities: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 num_identities=0
    2024-04-04T07:20:34.995Z [DEBUG] client.alloc_migrator: waiting for previous alloc to terminate: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 previous_alloc=ca01625a-b831-646a-308f-fde4490bae9c
    2024-04-04T07:20:34.996Z [DEBUG] client.alloc_migrator: waiting for previous alloc to terminate: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 previous_alloc=ca01625a-b831-646a-308f-fde4490bae9c
    2024-04-04T07:20:34.996Z [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch
    2024-04-04T07:20:34.997Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch type="Task Setup" msg="Building Task Directory" failed=false
    2024-04-04T07:20:34.998Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch path=/tmp/nomad args=["/tmp/nomad", "logmon"]
    2024-04-04T07:20:34.999Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch path=/tmp/nomad pid=25146
    2024-04-04T07:20:35.000Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch plugin=/tmp/nomad
    2024-04-04T07:20:35.000Z [DEBUG] nomad.periodic: skipping launch of periodic job because job prohibits overlap: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:35.000Z [DEBUG] nomad.periodic: scheduled periodic job launch: launch_delay=999.044394ms job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:35.002Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/evaluations?index=21 duration="855.833µs"
    2024-04-04T07:20:35.115Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic/allocations?index=21 duration=113.494531ms
    2024-04-04T07:20:35.119Z [DEBUG] client: updated allocations: index=23 total=2 pulled=1 filtered=1
    2024-04-04T07:20:35.120Z [DEBUG] client: allocation updates: added=0 removed=0 updated=1 ignored=1
    2024-04-04T07:20:35.120Z [DEBUG] client: allocation updates applied: added=0 removed=0 updated=1 ignored=1 errors=0
    2024-04-04T07:20:35.279Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch address=/tmp/plugin1058375108 network=unix @module=logmon timestamp=2024-04-04T07:20:35.279Z
    2024-04-04T07:20:35.279Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch version=2
    2024-04-04T07:20:35.287Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch @module=logmon path=/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/alloc/logs/.batch.stdout.fifo timestamp=2024-04-04T07:20:35.287Z
    2024-04-04T07:20:35.288Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch path=/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/alloc/logs/.batch.stderr.fifo @module=logmon timestamp=2024-04-04T07:20:35.287Z
    2024-04-04T07:20:35.292Z [INFO]  agent: (runner) creating new runner (dry: false, once: false)
    2024-04-04T07:20:35.293Z [DEBUG] agent: (runner) final config: {"Consul":{"Address":"127.0.0.1:8500","Namespace":"","Auth":{"Enabled":false,"Username":""},"Retry":{"Attempts":0,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"Token":"","TokenFile":"","Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000}},"Dedup":{"Enabled":false,"MaxStale":2000000000,"Prefix":"consul-template/dedup/","TTL":15000000000,"BlockQueryWaitTime":60000000000},"DefaultDelims":{"Left":null,"Right":null},"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":0},"KillSignal":2,"LogLevel":"WARN","FileLog":{"LogFilePath":"","LogRotateBytes":0,"LogRotateDuration":86400000000000,"LogRotateMaxFiles":0},"MaxStale":315360000000000000,"PidFile":"","ReloadSignal":1,"Syslog":{"Enabled":false,"Facility":"LOCAL0","Name":"consul-template"},"Templates":[{"Backup":false,"Command":[],"CommandTimeout":30000000000,"Contents":"#!/bin/ash\necho $(date) \u003e\u003e /tmp/periodic.txt\n","CreateDestDirs":true,"Destination":"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh","ErrMissingKey":false,"ErrFatal":true,"Exec":{"Command":[],"Enabled":false,"Env":{"Denylist":[],"Custom":[],"Pristine":false,"Allowlist":[]},"KillSignal":2,"KillTimeout":30000000000,"ReloadSignal":null,"Splay":0,"Timeout":30000000000},"Perms":420,"User":null,"Uid":null,"Group":null,"Gid":null,"Source":"","Wait":{"Enabled":false,"Min":0,"Max":0},"LeftDelim":"{{","RightDelim":"}}","FunctionDenylist":["plugin","writeToFile"],"SandboxPath":"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch","MapToEnvironmentVariable":""}],"TemplateErrFatal":null,"Vault":{"Address":"","Enabled":false,"Namespace":"","RenewToken":false,"Retry":{"Attempts":12,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true},"SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":true,"Key":"","ServerName":"","Verify":true},"Transport":{"CustomDialer":null,"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"UnwrapToken":false,"ClientUserAgent":null,"DefaultLeaseDuration":300000000000,"LeaseRenewalThreshold":0.9,"K8SAuthRoleName":"","K8SServiceAccountTokenPath":"/run/secrets/kubernetes.io/serviceaccount/token","K8SServiceAccountToken":"","K8SServiceMountPath":"kubernetes"},"Nomad":{"Address":"","Enabled":true,"Namespace":"default","SSL":{"CaCert":"","CaCertBytes":"","CaPath":"","Cert":"","Enabled":false,"Key":"","ServerName":"","Verify":true},"AuthUsername":"","AuthPassword":"","Transport":{"CustomDialer":{},"DialKeepAlive":30000000000,"DialTimeout":30000000000,"DisableKeepAlives":false,"IdleConnTimeout":5000000000,"MaxIdleConns":0,"MaxIdleConnsPerHost":100,"MaxConnsPerHost":0,"TLSHandshakeTimeout":10000000000},"Retry":{"Attempts":0,"Backoff":250000000,"MaxBackoff":60000000000,"Enabled":true}},"Wait":{"Enabled":true,"Min":5000000000,"Max":240000000000},"Once":false,"ParseOnly":false,"BlockQueryWaitTime":300000000000,"ErrOnFailedLookup":false}
    2024-04-04T07:20:35.293Z [INFO]  agent: (runner) creating watcher
    2024-04-04T07:20:35.294Z [INFO]  agent: (runner) starting
    2024-04-04T07:20:35.294Z [DEBUG] agent: (runner) running initial templates
    2024-04-04T07:20:35.294Z [DEBUG] agent: (runner) initiating run
    2024-04-04T07:20:35.294Z [DEBUG] agent: (runner) checking template 658eec0c0dc8fc19e3072fc89a256aeb
    2024-04-04T07:20:35.296Z [DEBUG] agent: (runner) rendering "(dynamic)" => "/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh"
    2024-04-04T07:20:35.477Z [INFO]  agent: (runner) rendered "(dynamic)" => "/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh"
    2024-04-04T07:20:35.477Z [DEBUG] agent: (runner) diffing and updating dependencies
    2024-04-04T07:20:35.477Z [DEBUG] agent: (runner) watching 0 dependencies
    2024-04-04T07:20:35.477Z [DEBUG] agent: (runner) all templates rendered
    2024-04-04T07:20:35.477Z [DEBUG] agent: (runner) enabling global quiescence for "658eec0c0dc8fc19e3072fc89a256aeb"
    2024-04-04T07:20:35.479Z [INFO]  client.driver_mgr.raw_exec: starting task: driver=raw_exec driver_cfg="{Command:run.sh Args:[]}"
    2024-04-04T07:20:35.480Z [DEBUG] client.driver_mgr.raw_exec.executor: starting plugin: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch path=/tmp/nomad args=["/tmp/nomad", "executor", "{\"LogFile\":\"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/executor.out\",\"LogLevel\":\"debug\",\"FSIsolation\":false,\"Compute\":{\"tc\":8832,\"nc\":4}}"]
    2024-04-04T07:20:35.481Z [DEBUG] client.driver_mgr.raw_exec.executor: plugin started: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch path=/tmp/nomad pid=25164
    2024-04-04T07:20:35.481Z [DEBUG] client.driver_mgr.raw_exec.executor: waiting for RPC address: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch plugin=/tmp/nomad
    2024-04-04T07:20:35.761Z [DEBUG] client.driver_mgr.raw_exec.executor: using plugin: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch version=2
    2024-04-04T07:20:35.788Z [DEBUG] nomad.periodic: deregistered periodic job: job="<ns: \"default\", id: \"periodic\">"
    2024-04-04T07:20:35.790Z [DEBUG] http: request complete: method=DELETE path=/v1/job/periodic duration=2.824323ms
    2024-04-04T07:20:35.791Z [DEBUG] http: request complete: method=GET path=/v1/job/periodic?index=12 duration=7.685957445s
    2024-04-04T07:20:35.901Z [DEBUG] client.driver_mgr.raw_exec.executor.stdio: received EOF, stopping recv loop: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch err="rpc error: code = Unavailable desc = error reading from server: EOF"
    2024-04-04T07:20:35.909Z [INFO]  client.driver_mgr.raw_exec.executor: plugin process exited: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch plugin=/tmp/nomad id=25164
    2024-04-04T07:20:35.909Z [DEBUG] client.driver_mgr.raw_exec.executor: plugin exited: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 driver=raw_exec task_name=batch
    2024-04-04T07:20:35.909Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch type="Driver Failure" msg="failed to launch command with executor: rpc error: code = Unknown desc = failed to start command path=\"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh\" --- args=[\"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh\"]: fork/exec /tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh: function not implemented" failed=false
    2024-04-04T07:20:35.909Z [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch error="failed to launch command with executor: rpc error: code = Unknown desc = failed to start command path=\"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh\" --- args=[\"/tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh\"]: fork/exec /tmp/NomadClient954728857/f59c52af-85e2-44e1-c256-3866e91c2c29/batch/run.sh: function not implemented"
    2024-04-04T07:20:35.909Z [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch reason="Error was unrecoverable"
    2024-04-04T07:20:35.909Z [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch type="Not Restarting" msg="Error was unrecoverable" failed=true
    2024-04-04T07:20:35.913Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.stdio: received EOF, stopping recv loop: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch err="rpc error: code = Unavailable desc = error reading from server: EOF"
    2024-04-04T07:20:35.920Z [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch plugin=/tmp/nomad id=25146
    2024-04-04T07:20:35.920Z [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin exited: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch
    2024-04-04T07:20:35.921Z [INFO]  agent: (runner) stopping
    2024-04-04T07:20:35.921Z [DEBUG] agent: (runner) stopping watcher
    2024-04-04T07:20:35.921Z [DEBUG] agent: (watcher) stopping all views
    2024-04-04T07:20:35.921Z [DEBUG] client.alloc_runner.task_runner: task run loop exiting: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29 task=batch
    2024-04-04T07:20:35.921Z [INFO]  agent: (runner) received finish
    2024-04-04T07:20:35.921Z [INFO]  client.gc: marking allocation for GC: alloc_id=f59c52af-85e2-44e1-c256-3866e91c2c29
tgross commented 5 months ago

Hi @Great-Stone! I'm pretty sure the error "function not implemented" is coming from the kernel. You could verify this by running under strace (or similar) and checking to see if you get that error from one of the syscalls. When you built on Alpine under UTM, you may have a different kernel build than is available on the target. Given that it's happening from fork/exec, I'd hypothesize that it's a missing cgroup-related kernel configuration option.

If you're building the kernel for the 5G router, you may want to diff the configuration between the router and the UTM VM to see if you can spot the missing component. If you're not building the kernel for the 5G router and/or can't update it for whatever reason, you unfortunately may be out of luck here without creating your own task driver that avoids anything cgroup related (or whatever the missing piece is).

Great-Stone commented 5 months ago

Hi @tgross, Thank you for letting us know about the problem. Could you please confirm the required condition related to cgroups? (e.g. specific package elements or settings) I would like to inform this to the person responsible for building the OS.

tgross commented 5 months ago

@Great-Stone as I noted above, a missing cgroups configuration value is only a hypothesis. You need to run Nomad under strace (or similar) to determine what the failing syscall is in order to further diagnose the problem.

Great-Stone commented 5 months ago

Leave updates.

The specific specs of the machine were OpenWrt 15.05.1, so I tried again, but it didn't work. Presumably this is a Linux environment thing, as it doesn't work on older versions of RHEL.

mxrroot@5Gax:/# cat /etc/openwrt_version
15.05.1

mxrroot@5Gax:/# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='Chaos Calmer'
DISTRIB_REVISION='ca0ec2bfb0ef+r49254'
DISTRIB_CODENAME='chaos_calmer'
DISTRIB_TARGET='ipq/ipq807x_64'
DISTRIB_DESCRIPTION='OpenWrt Chaos Calmer 15.05.1'
DISTRIB_TAINTS='no-all busybox override'

Since OpenWrt 18.06.0, it works fine.

tgross commented 5 months ago

Ok, so this isn't Alpine Linux, it's OpenWrt. They both use musl libc but that doesn't make them the same distribution. OpenWrt uses buildroot. So whomever is configuring the kconfig for buildroot should be able to verify whether there's a missing kernel configuration value (possibly but not necessarily for cgroups), and the operator should be able to verify specifically what's happening with strace. This isn't a well-supported configuration, and not present in our official release builds, so without the user doing more diagnostics as I've described here on their end, there's very little we can do to help.