iterative / cml

♾️ CML - Continuous Machine Learning | CI/CD for ML
http://cml.dev
Apache License 2.0
4.03k stars 341 forks source link

--idle-time="never" causes an unhandled error with GitLab driver #1407

Closed BradyJ27 closed 1 year ago

BradyJ27 commented 1 year ago

When launching a CML runner with a GitLab runner, the flag --idle-time="never" throws an unhandled rejection.

Command: cml runner launch --driver="gitlab" --repo="https://gitlab.com/example_cml" --token="xyz" --idle-timeout="never" --log="debug"

Output:

info: Preparing workdir /root/.cml/veudy407tz...
info: Launching gitlab runner
info: Connected to acpid service.
debug: Incorrect Usage: invalid value "never" for flag -wait-timeout: strconv.ParseInt: parsing "never": invalid syntax
debug: {"level":"fatal","msg":"invalid value \"never\" for flag -wait-timeout: strconv.ParseInt: parsing \"never\": invalid syntax","time":"2023-07-27T12:59:28-04:00"}
info: Unregistering runner veudy407tz...
info:   Success
error: unhandledRejection: runner closed with exit code 1
Error: runner closed with exit code 1
    at ChildProcess.<anonymous> (/root/.nvm/versions/node/v16.20.1/lib/node_modules/@dvcorg/cml/bin/cml/runner/launch.js:326:37)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5) 
{"date":"Thu Jul 27 2023 12:59:31 GMT-0400 (Eastern Daylight Time)","error":{},"exception":true,"os":{"loadavg":[0.43,0.22,0.19],"uptime":908172.44},"process":{"argv":["/root/.nvm/versions/node/v16.20.1/bin/node","/root/.nvm/versions/node/v16.20.1/bin/cml","runner","launch","--driver=gitlab","--repo=https://gitlab.com/example_cml","--token=xyz","--idle-timeout=never","--log=debug"],"cwd":"/app","execPath":"/root/.nvm/versions/node/v16.20.1/bin/node","gid":0,"memoryUsage":{"arrayBuffers":25483773,"external":59109208,"heapTotal":62033920,"heapUsed":34230520,"rss":126709760},"pid":14498,"uid":0,"version":"v16.20.1"},"stack":"Error: runner closed with exit code 1\n    at ChildProcess.<anonymous> (/root/.nvm/versions/node/v16.20.1/lib/node_modules/@dvcorg/cml/bin/cml/runner/launch.js:326:37)\n    at ChildProcess.emit (node:events:513:28)\n    at maybeClose (node:internal/child_process:1100:16)\n    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)","trace":[{"column":37,"file":"/root/.nvm/versions/node/v16.20.1/lib/node_modules/@dvcorg/cml/bin/cml/runner/launch.js","function":null,"line":326,"method":null,"native":false},{"column":28,"file":"node:events","function":"ChildProcess.emit","line":513,"method":"emit","native":false},{"column":16,"file":"node:internal/child_process","function":"maybeClose","line":1100,"method":null,"native":false},{"column":5,"file":"node:internal/child_process","function":"Process.ChildProcess._handle.onexit","line":304,"method":"onexit","native":false}]}

I believe this is occurring because Gitlab runners do not support --wait-timeout=never (see here).
Instead, when the option --idle-timeout="never" is passed with GitLab as the driver, the GitLab runner should start with --wait-timeout=0.