ocurrent / ocluster

Distribute CI builds to worker nodes over Cap'n Proto
Apache License 2.0
35 stars 18 forks source link

The output from "ocluster-admin show" is sometimes cryptic #212

Open kit-ty-kate opened 1 year ago

kit-ty-kate commented 1 year ago

The registered section shows:

registered:
  i7-worker-01 (0): [] (1 running)
  i7-worker-02 (10): [opam-repo-ci:macos-homebrew-ocaml-4.14-tezos-injector-016-PtMumbai.16.0~rc1-f280a0f2882102032292266c928080f782a1c0aa@4585m(10+urgent)] (1 running)
  i7-worker-03 (0): [] (1 running)
  i7-worker-04 (10): [opam-repo-ci:macos-homebrew-ocaml-4.14-tezos-layer2-utils-016-PtMumbai.16.0~rc1-f280a0f2882102032292266c928080f782a1c0aa@4565m(10+urgent)] (1 running)

to me i interpret this as:

a similar output that is as perplexing is:

queue: (backlog) [... <a huge backlog>]
registered:
  m1-worker-02 (0): [] (1 running)
  m1-worker-03 (0): [] (1 running)
  m1-worker-04 (0): [] (1 running)

where without sshing into the machines themselves you wouldn't be able to know what they are doing.

talex5 commented 1 year ago

The scheduler only reports details of queued jobs, not running ones. All four machines here are running a job, but two have another job queued up after that.

(and yes, it would be nice to get information about running jobs too somehow, but the scheduler doesn't care much about them once they start)