Open 0x2b3bfa0 opened 1 year ago
hmm I suspect this to be the source of my failed termination issues, on cml-playground, I say that I have observed the opposite though... the runner detects the first job and enters its busy
state but fails to detect the job finishing and never enters an idle
state.
Is there no better way of detecting the runner state, other than parsing logs?
On Fri, Dec 30, 2022, 18:40 Daniel Barnes @.***> wrote:
hmm I suspect this to be the source of my failed termination issues, on cml-playground, I say that I have observed the opposite though... the runner detects the first job and enters its busy state but fails to detect the job finishing and never enters an idle state.
— Reply to this email directly, view it on GitHub https://github.com/iterative/cml/issues/1301#issuecomment-1368004613, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA5DNAE4YCE6KXLEVS752TWP4GAPANCNFSM6AAAAAATM776OA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
We also call the GitHub API, but that also has its own issues, and it's only used as final validation.
the runner detects the first job and enters its
busy
state but fails to detect the job finishing and never enters anidle
state
Sounds, indeed, like a buffering issue. 🤔 Node.js treatment of file descriptors is, indeed, cough very particular, and anything is possible.
Is there no better way of detecting the runner state, other than parsing logs?
At least, none known to me. 😰
1299 is a duct tape workaround to what seems to be a buffering issue on the GitHub self-hosted runner output, which prevents our logic from detecting when the runner transitions to a
busy
state in a timely manner; see https://github.com/iterative/cml/issues/1255#issuecomment-1367099955 for context.