Closed lemontheme closed 2 years ago
i can say again that GCP is terminating as expected. Can you please get into the machine and run and let me know what says?
journalctl --unit cml --no-pager
@lemontheme
Sure thing. Here's what I get:
-- Logs begin at Thu 2021-07-29 14:22:56 UTC, end at Thu 2021-07-29 14:38:42 UTC. --
Jul 29 14:26:39 cml-36s36ywc7z systemd[1]: Started cml.service.
Jul 29 14:26:46 cml-36s36ywc7z cml.sh[17975]: Preparing workdir /tmp/tmp.b7BwstF7kJ/.cml/cml-07toknujbd...
Jul 29 14:26:46 cml-36s36ywc7z cml.sh[17975]: Launching github runner
Jul 29 14:27:10 cml-36s36ywc7z cml.sh[17975]: SpotNotifier can not be started.
Jul 29 14:27:11 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:11.452Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","message":""}
Jul 29 14:27:11 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:11.453Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","message":"√ "}
Jul 29 14:27:11 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:11.454Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","message":"Connected to Git
Hub"}
Jul 29 14:27:11 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:11.454Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","message":""}
Jul 29 14:27:11 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:11.995Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","status":"ready","message":
"Listening for Jobs"}
Jul 29 14:27:22 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:27:22.333Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","job":3192860335,"status":"
job_started","message":"Running job: model-training"}
Jul 29 14:34:29 cml-36s36ywc7z cml.sh[17975]: {"level":"info","date":"2021-07-29T14:34:29.721Z","repo":"https://github.com/lemontheme/mlops-with-gh-actions","job":"","status":"job_ende
d","success":true,"message":"Job model-training completed with result: Succeeded"}
And stills? AS far as I can see the timeout is not happening... What a weird thing
I can see that the runner is terminating properly itself with idle-time however when I destroy it using the terraform provider, sometimes GCP does not send the graceful shutdown
However this does reflect the issue here where seems that the chrono might be not working
@lemontheme I believe this issue is resolved, can you confirm your workflow is functional without any workarounds?
Hi @dacbd, sorry to keep you waiting. Been a while since I looked at this.
Anyway, I'm happy to confirm that instances are now indeed stopped and deleted as expected! :) That's using the exact same workflow as above. Great to see you've made progress with this. Thanks!
Thank you very much, @dacbd for the fix and @lemontheme for confirming the resolution!
This is a repeat of #661, which was supposedly fixed in #653. Unfortunately, I'm not seeing any changes in the shutdown behavior of my GCP compute instances. That is, they keep running past the timeout interval.
I'm using the same workflow as before (in #661):
Anyway, this seems to contradict the tests, as @DavidGOrtega explains in the comments under #653:
Any idea what I might be doing wrong?