hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.8k stars 1.94k forks source link

docker job terminated by Exit Code: 0 without valid logs and also cant check the log stream from executor #17954

Closed TimonPeng closed 1 year ago

TimonPeng commented 1 year ago

Nomad version

Nomad v1.5.6 BuildDate 2023-05-19T18:26:13Z Revision 8af70885c02ab921dedbdf6bc406a1e886866f80

Operating system and Environment details

System

Tested on macOS 12.5 and Ubuntu 20.04.6 LTS all failed.

Docker version

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:22 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:22 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Issue

docker job terminated by Exit Code: 0 without any valid logs in alloc/logs. And no executor logs.

Reproduction steps

nomad agent -dev -data-dir=/Users/timon/devops -log-level=trace
nomad job run job/pyth-agent.hcl

Expected Result

The docker container starts up well as running directly from docker. Or is there somewhere can look at the entire docker execution flow to identify the problem?

docker run -it public.ecr.aws/pyth-network/agent:v1.4.0

Actual Result

terminated by Exit Code: 0

job events

Job file (if appropriate)

job "pyth-job" {
  task "pyth-agent" {
    driver = "docker"
    config {
      image = "public.ecr.aws/pyth-network/agent:v1.4.0"
    }
  }
}

Nomad Server logs (if appropriate)

Nomad server and client are on the same host

2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job mutate results: mutator=canonicalize warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job mutate results: mutator=connect warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job mutate results: mutator=expose-check warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job mutate results: mutator=constraints warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=connect warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=expose-check warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=vault warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=namespace-constraint-check warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=validate warnings=[] error=<nil>
2023-07-16T15:01:41.152+0800 [TRACE] nomad.job: job validate results: validator=memory_oversubscription warnings=[] error=<nil>
2023-07-16T15:01:41.153+0800 [DEBUG] worker: dequeued evaluation: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b type=service namespace=default job_id=pyth-job node_id="" triggered_by=job-register
2023-07-16T15:01:41.153+0800 [TRACE] worker: changed workload status: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d from=WaitingToDequeue to=WaitingForRaft
2023-07-16T15:01:41.153+0800 [TRACE] worker: changed workload status: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d from=WaitingForRaft to=Scheduling
2023-07-16T15:01:41.153+0800 [TRACE] worker.service_sched.binpack: NewBinPackIterator created: eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b job_id=pyth-job namespace=default worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d algorithm=binpack
2023-07-16T15:01:41.153+0800 [DEBUG] http: request complete: method=PUT path=/v1/jobs duration=2.15103ms
2023-07-16T15:01:41.153+0800 [DEBUG] worker.service_sched: reconciled current state with desired state: eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b job_id=pyth-job namespace=default worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d
  results=
  | Total changes: (place 1) (destructive 0) (inplace 0) (stop 0) (disconnect 0) (reconnect 0)
  | Created Deployment: "d306a6f4-a181-8a9e-710c-f6efa14f6ba8"
  | Desired Changes for "pyth-agent": (place 1) (inplace 0) (destructive 0) (stop 0) (migrate 0) (ignore 0) (canary 0)

2023-07-16T15:01:41.153+0800 [TRACE] nomad: evaluating plan: plan="(eval 7dee4aa1, job pyth-job, deploy d306a6f4, NodeAllocations: (node[9af6d161] (2b15a7da pyth-job.pyth-agent[0] run)))"
2023-07-16T15:01:41.154+0800 [DEBUG] http: request complete: method=GET path=/v1/evaluation/7dee4aa1-c72d-b0f5-965f-cadf7b0d630b duration="229.759µs"
2023-07-16T15:01:41.154+0800 [DEBUG] worker: submitted plan for evaluation: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b
2023-07-16T15:01:41.154+0800 [DEBUG] worker.service_sched: setting eval status: eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b job_id=pyth-job namespace=default worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d status=complete
2023-07-16T15:01:41.155+0800 [TRACE] client: next heartbeat: period=16.088875953s
2023-07-16T15:01:41.155+0800 [DEBUG] worker: updated evaluation: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d eval="<Eval \"7dee4aa1-c72d-b0f5-965f-cadf7b0d630b\" JobID: \"pyth-job\" Namespace: \"default\">"
2023-07-16T15:01:41.155+0800 [DEBUG] client: updated allocations: index=12 total=1 pulled=1 filtered=0
2023-07-16T15:01:41.155+0800 [DEBUG] worker: ack evaluation: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d eval_id=7dee4aa1-c72d-b0f5-965f-cadf7b0d630b type=service namespace=default job_id=pyth-job node_id="" triggered_by=job-register
2023-07-16T15:01:41.155+0800 [DEBUG] client: allocation updates: added=1 removed=0 updated=0 ignored=0
2023-07-16T15:01:41.155+0800 [TRACE] worker: changed workload status: worker_id=2a472550-ab14-1a47-9a3a-51a3c6858d9d from=Scheduling to=WaitingToDequeue
2023-07-16T15:01:41.155+0800 [DEBUG] http: request complete: method=GET path=/v1/evaluation/7dee4aa1-c72d-b0f5-965f-cadf7b0d630b/allocations duration="293.381µs"
2023-07-16T15:01:41.155+0800 [TRACE] client.alloc_runner.task_coordinator: state transition: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f from=init to=init
2023-07-16T15:01:41.155+0800 [TRACE] client: next heartbeat: period=17.495473085s
2023-07-16T15:01:41.156+0800 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent type=Received msg="Task received by client" failed=false
2023-07-16T15:01:41.156+0800 [DEBUG] client: allocation updates applied: added=1 removed=0 updated=0 ignored=0 errors=0
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f start="2023-07-16 15:01:41.156078 +0800 +08 m=+34.711624094"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=alloc_dir start="2023-07-16 15:01:41.156128 +0800 +08 m=+34.711674219"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=alloc_dir end="2023-07-16 15:01:41.156598 +0800 +08 m=+34.712144598" duration="470.379µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=cgroup start="2023-07-16 15:01:41.156619 +0800 +08 m=+34.712164793"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=cgroup end="2023-07-16 15:01:41.156627 +0800 +08 m=+34.712172711" duration="7.918µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=await_previous_allocations start="2023-07-16 15:01:41.156634 +0800 +08 m=+34.712179955"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=await_previous_allocations end="2023-07-16 15:01:41.156641 +0800 +08 m=+34.712187653" duration="7.698µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=migrate_disk start="2023-07-16 15:01:41.156711 +0800 +08 m=+34.712256784"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=migrate_disk end="2023-07-16 15:01:41.156718 +0800 +08 m=+34.712264225" duration="7.441µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=alloc_health_watcher start="2023-07-16 15:01:41.156725 +0800 +08 m=+34.712270801"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner.runner_hook.alloc_health_watcher: watching: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f deadline="2023-07-16 15:06:41.156732 +0800 +08 m=+334.712278336" checks=true min_healthy_time=10s
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=alloc_health_watcher end="2023-07-16 15:01:41.15679 +0800 +08 m=+34.712335744" duration="64.943µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=network start="2023-07-16 15:01:41.1568 +0800 +08 m=+34.712345941"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=network end="2023-07-16 15:01:41.156806 +0800 +08 m=+34.712351922" duration="5.981µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=group_services start="2023-07-16 15:01:41.156815 +0800 +08 m=+34.712360975"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=group_services end="2023-07-16 15:01:41.15682 +0800 +08 m=+34.712366559" duration="5.584µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=consul_grpc_socket start="2023-07-16 15:01:41.156857 +0800 +08 m=+34.712403251"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=consul_grpc_socket end="2023-07-16 15:01:41.156869 +0800 +08 m=+34.712415569" duration="12.318µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=consul_http_socket start="2023-07-16 15:01:41.156878 +0800 +08 m=+34.712423821"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=consul_http_socket end="2023-07-16 15:01:41.156883 +0800 +08 m=+34.712429610" duration="5.789µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=csi_hook start="2023-07-16 15:01:41.15689 +0800 +08 m=+34.712436268"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=csi_hook end="2023-07-16 15:01:41.156972 +0800 +08 m=+34.712518716" duration="82.448µs"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: running pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=checks_hook start="2023-07-16 15:01:41.156988 +0800 +08 m=+34.712533963"
2023-07-16T15:01:41.156+0800 [TRACE] client.alloc_runner: finished pre-run hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f name=checks_hook end="2023-07-16 15:01:41.156995 +0800 +08 m=+34.712541244" duration="7.281µs"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner: finished pre-run hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f end="2023-07-16 15:01:41.157007 +0800 +08 m=+34.712552955" duration="928.861µs"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner: handling task state update: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f done=false
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_coordinator: state transition: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f from=init to=prestart
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_coordinator: state transition: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f from=prestart to=main
2023-07-16T15:01:41.157+0800 [DEBUG] client.alloc_runner.task_runner: lifecycle start condition has been met, proceeding: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: running prestart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent start="2023-07-16 15:01:41.157088 +0800 +08 m=+34.712634666"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner: sending updated alloc: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f client_status=pending desired_status=""
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=validate start="2023-07-16 15:01:41.157137 +0800 +08 m=+34.712683432"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=validate end="2023-07-16 15:01:41.157153 +0800 +08 m=+34.712698886" duration="15.454µs"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_dir start="2023-07-16 15:01:41.157174 +0800 +08 m=+34.712720104"
2023-07-16T15:01:41.157+0800 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent type="Task Setup" msg="Building Task Directory" failed=false
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner: handling task state update: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f done=false
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner: sending updated alloc: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f client_status=pending desired_status=""
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_dir end="2023-07-16 15:01:41.157566 +0800 +08 m=+34.713111812" duration="391.708µs"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=identity start="2023-07-16 15:01:41.157601 +0800 +08 m=+34.713146903"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=identity end="2023-07-16 15:01:41.157615 +0800 +08 m=+34.713161387" duration="14.484µs"
2023-07-16T15:01:41.157+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=logmon start="2023-07-16 15:01:41.15764 +0800 +08 m=+34.713186453"
2023-07-16T15:01:41.157+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: starting plugin: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent path=/usr/local/bin/nomad args=["/usr/local/bin/nomad", "logmon"]
2023-07-16T15:01:41.161+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: plugin started: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent path=/usr/local/bin/nomad pid=4563
2023-07-16T15:01:41.161+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: waiting for RPC address: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent path=/usr/local/bin/nomad
2023-07-16T15:01:41.190+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon: using plugin: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent version=2
2023-07-16T15:01:41.190+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: plugin address: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent address=/var/folders/jx/76r5ytsx0mlcpt9c57p8xpbr0000gn/T/plugin2534182848 network=unix @module=logmon timestamp="2023-07-16T15:01:41.189+0800"
2023-07-16T15:01:41.191+0800 [TRACE] client.alloc_runner.task_runner.task_hook.logmon.stdio: waiting for stdio data: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent
2023-07-16T15:01:41.192+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent @module=logmon path=/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/alloc/logs/.pyth-agent.stdout.fifo timestamp="2023-07-16T15:01:41.192+0800"
2023-07-16T15:01:41.192+0800 [DEBUG] client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent @module=logmon path=/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/alloc/logs/.pyth-agent.stderr.fifo timestamp="2023-07-16T15:01:41.192+0800"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=logmon end="2023-07-16 15:01:41.193095 +0800 +08 m=+34.748641289" duration=35.454836ms
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=dispatch_payload start="2023-07-16 15:01:41.193158 +0800 +08 m=+34.748703449"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=dispatch_payload end="2023-07-16 15:01:41.19317 +0800 +08 m=+34.748716407" duration="12.958µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=volumes start="2023-07-16 15:01:41.193199 +0800 +08 m=+34.748745056"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=volumes end="2023-07-16 15:01:41.193212 +0800 +08 m=+34.748757493" duration="12.437µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=artifacts start="2023-07-16 15:01:41.193246 +0800 +08 m=+34.748792064"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=artifacts end="2023-07-16 15:01:41.193254 +0800 +08 m=+34.748800027" duration="7.963µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=devices start="2023-07-16 15:01:41.193279 +0800 +08 m=+34.748824840"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=devices end="2023-07-16 15:01:41.193287 +0800 +08 m=+34.748832614" duration="7.774µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=api start="2023-07-16 15:01:41.193318 +0800 +08 m=+34.748863871"
2023-07-16T15:01:41.193+0800 [WARN]  client.alloc_runner.task_runner.task_hook.api: error creating task api socket: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent path=/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/secrets/api.sock error="listen unix /Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/secrets/api.sock: bind: invalid argument"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=api end="2023-07-16 15:01:41.1934 +0800 +08 m=+34.748945602" duration="81.731µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: running prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=script_checks start="2023-07-16 15:01:41.19343 +0800 +08 m=+34.748975887"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=script_checks end="2023-07-16 15:01:41.19344 +0800 +08 m=+34.748986393" duration="10.506µs"
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: finished prestart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent end="2023-07-16 15:01:41.193454 +0800 +08 m=+34.748999647" duration=36.364981ms
2023-07-16T15:01:41.193+0800 [TRACE] client.alloc_runner.task_runner: waiting for cgroup to exist for: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent allocID=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task="&{pyth-agent docker  map[image:public.ecr.aws/pyth-network/agent:v1.4.0] map[] [] <nil> [] [] [] 0xc0006bf800 0xc00025f4a0 <nil> <nil> map[] 5s 0xc000aab338 [] false 0s [] []   <nil> <nil>}"
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: image reference count incremented: driver=docker image_name=public.ecr.aws/pyth-network/agent:v1.4.0 image_id=sha256:f340f8d6587ea694eaf20a7c4feb49506acd8ee80cc7e88146f78bf6564db4e1 references=1
2023-07-16T15:01:41.199+0800 [TRACE] client.driver_mgr.docker: binding volumes: driver=docker task_name=pyth-agent volumes=["/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/alloc:/alloc", "/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/local:/local", "/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/secrets:/secrets"]
2023-07-16T15:01:41.199+0800 [TRACE] client.driver_mgr.docker: no docker log driver provided, defaulting to plugin config: driver=docker task_name=pyth-agent
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: configured resources: driver=docker task_name=pyth-agent memory=2147483648 memory_reservation=0 cpu_shares=1024 cpu_quota=0 cpu_period=0
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: binding directories: driver=docker task_name=pyth-agent binds="[]string{\"/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/alloc:/alloc\", \"/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/local:/local\", \"/Users/timon/devops/alloc/2b15a7da-8e7f-2844-0518-47b7bff6fa3f/pyth-agent/secrets:/secrets\"}"
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: networking mode not specified; using default: driver=docker task_name=pyth-agent
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: applied labels on the container: driver=docker task_name=pyth-agent labels=map[com.hashicorp.nomad.alloc_id:2b15a7da-8e7f-2844-0518-47b7bff6fa3f]
2023-07-16T15:01:41.199+0800 [DEBUG] client.driver_mgr.docker: setting container name: driver=docker task_name=pyth-agent container_name=pyth-agent-2b15a7da-8e7f-2844-0518-47b7bff6fa3f
2023-07-16T15:01:41.246+0800 [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=8269c350eb327ac77ee86c90b2b508b72e60c1c20a2ff55c976919650c02f608
2023-07-16T15:01:41.349+0800 [DEBUG] client: updated allocations: index=14 total=1 pulled=0 filtered=1
2023-07-16T15:01:41.349+0800 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=1
2023-07-16T15:01:41.349+0800 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=1 errors=0
2023-07-16T15:01:41.349+0800 [TRACE] client: next heartbeat: period=15.594552727s
2023-07-16T15:01:41.568+0800 [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=8269c350eb327ac77ee86c90b2b508b72e60c1c20a2ff55c976919650c02f608
2023-07-16T15:01:41.568+0800 [DEBUG] client.driver_mgr.docker.docker_logger: starting plugin: driver=docker path=/usr/local/bin/nomad args=["/usr/local/bin/nomad", "docker_logger"]
2023-07-16T15:01:41.570+0800 [DEBUG] client.driver_mgr.docker.docker_logger: plugin started: driver=docker path=/usr/local/bin/nomad pid=4564
2023-07-16T15:01:41.571+0800 [DEBUG] client.driver_mgr.docker.docker_logger: waiting for RPC address: driver=docker path=/usr/local/bin/nomad
2023-07-16T15:01:41.597+0800 [DEBUG] client.driver_mgr.docker.docker_logger: using plugin: driver=docker version=2
2023-07-16T15:01:41.597+0800 [DEBUG] client.driver_mgr.docker.docker_logger.nomad: plugin address: driver=docker address=/var/folders/jx/76r5ytsx0mlcpt9c57p8xpbr0000gn/T/plugin2867661771 network=unix @module=docker_logger timestamp="2023-07-16T15:01:41.597+0800"
2023-07-16T15:01:41.599+0800 [TRACE] client.driver_mgr.docker.docker_logger.stdio: waiting for stdio data: driver=docker
2023-07-16T15:01:41.600+0800 [DEBUG] client.driver_mgr.docker.docker_logger.nomad: using client connection initialized from environment: driver=docker @module=docker_logger timestamp="2023-07-16T15:01:41.600+0800"
2023-07-16T15:01:41.600+0800 [TRACE] client.alloc_runner.task_runner: setting task state: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent state=running
2023-07-16T15:01:41.600+0800 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent type=Started msg="Task started by client" failed=false
2023-07-16T15:01:41.600+0800 [TRACE] client.alloc_runner.task_runner: running poststart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent start="2023-07-16 15:01:41.600913 +0800 +08 m=+35.156455794"
2023-07-16T15:01:41.600+0800 [TRACE] client.alloc_runner.task_runner: running poststart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=stats_hook start="2023-07-16 15:01:41.600948 +0800 +08 m=+35.156490718"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner: handling task state update: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f done=false
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: finished poststart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=stats_hook end="2023-07-16 15:01:41.601038 +0800 +08 m=+35.156581100" duration="90.382µs"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: running poststart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_services start="2023-07-16 15:01:41.601078 +0800 +08 m=+35.156620610"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: finished poststart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_services end="2023-07-16 15:01:41.601238 +0800 +08 m=+35.156781309" duration="160.699µs"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: running poststart hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=script_checks start="2023-07-16 15:01:41.601261 +0800 +08 m=+35.156804092"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_coordinator: state transition: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f from=main to=poststart
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_coordinator: state transition: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f from=poststart to=wait_alloc
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: finished poststart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=script_checks end="2023-07-16 15:01:41.601318 +0800 +08 m=+35.156860616" duration="56.524µs"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner.task_runner: finished poststart hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent end="2023-07-16 15:01:41.601461 +0800 +08 m=+35.157003653" duration="547.859µs"
2023-07-16T15:01:41.601+0800 [TRACE] client.alloc_runner: sending updated alloc: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f client_status=running desired_status=""
2023-07-16T15:01:41.700+0800 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent type=Terminated msg="Exit Code: 0" failed=false
2023-07-16T15:01:41.700+0800 [TRACE] client.alloc_runner: handling task state update: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f done=false
2023-07-16T15:01:41.700+0800 [TRACE] client.alloc_runner: sending updated alloc: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f client_status=running desired_status=""
2023-07-16T15:01:41.700+0800 [DEBUG] client.driver_mgr.docker.docker_logger.stdio: received EOF, stopping recv loop: driver=docker err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-07-16T15:01:41.703+0800 [INFO]  client.driver_mgr.docker.docker_logger: plugin process exited: driver=docker path=/usr/local/bin/nomad pid=4564
2023-07-16T15:01:41.703+0800 [DEBUG] client.driver_mgr.docker.docker_logger: plugin exited: driver=docker
2023-07-16T15:01:41.728+0800 [DEBUG] client.driver_mgr.docker: image id reference count decremented: driver=docker image_id=sha256:f340f8d6587ea694eaf20a7c4feb49506acd8ee80cc7e88146f78bf6564db4e1 references=0
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: running exited hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent start="2023-07-16 15:01:41.728648 +0800 +08 m=+35.284190119"
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: running exited hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=stats_hook start="2023-07-16 15:01:41.728682 +0800 +08 m=+35.284223595"
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: finished exited hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=stats_hook end="2023-07-16 15:01:41.728699 +0800 +08 m=+35.284241375" duration="17.78µs"
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: running exited hook: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_services start="2023-07-16 15:01:41.728721 +0800 +08 m=+35.284263006"
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: finished exited hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent name=task_services end="2023-07-16 15:01:41.728734 +0800 +08 m=+35.284275786" duration="12.78µs"
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: finished exited hooks: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent end="2023-07-16 15:01:41.728746 +0800 +08 m=+35.284287786" duration="97.667µs"
2023-07-16T15:01:41.728+0800 [INFO]  client.alloc_runner.task_runner: restarting task: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent reason="Restart within policy" delay=17.184145193s
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner.task_runner: setting task state: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent state=pending
2023-07-16T15:01:41.728+0800 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f task=pyth-agent type=Restarting msg="Task restarting in 17.184145193s" failed=false
2023-07-16T15:01:41.728+0800 [TRACE] client.alloc_runner: handling task state update: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f done=false
2023-07-16T15:01:41.729+0800 [TRACE] client.alloc_runner: sending updated alloc: alloc_id=2b15a7da-8e7f-2844-0518-47b7bff6fa3f client_status=pending desired_status=""
2023-07-16T15:01:41.800+0800 [DEBUG] client: updated allocations: index=15 total=1 pulled=0 filtered=1
2023-07-16T15:01:41.801+0800 [DEBUG] client: allocation updates: added=0 removed=0 updated=0 ignored=1
2023-07-16T15:01:41.801+0800 [DEBUG] client: allocation updates applied: added=0 removed=0 updated=0 ignored=1 errors=0
2023-07-16T15:01:41.801+0800 [TRACE] client: next heartbeat: period=11.732585161s

Nomad Client logs (if appropriate)

jrasell commented 1 year ago

Hi @TimonPeng, the exit code is coming from the container. I would suggest checking to see if the allocation emitted any logs via the alloc logs CLI command. You should try using both these commands to see the stdout and stderr logs:

$ nomad alloc logs <alloc_id> <task>
$ nomad alloc logs -stderr <alloc_id> <task>
jrasell commented 1 year ago

Hi @TimonPeng, I am going to close this issue as we have not received a response. If you are still experiencing problems, please feel free to reopen this issue and supply the additional context previously requested.