ocurrent / ocaml-docs-ci

CI building documentation for ALL versions of ALL packages !
https://docs.ci.ocaml.org
MIT License
17 stars 14 forks source link

Grep for temporary failures and retry. #105

Closed novemberkilo closed 1 year ago

novemberkilo commented 1 year ago

Following a suggestion from @TheLortex this PR introduces a naive way of retrying on temporary failures. The point is to introduce a small change and see if the retry mechanism addresses #103 - if it does, we can proceed towards a solution that is more robust.

TheLortex commented 1 year ago

Thanks for this PR. My main comment is about extracting the retry loop in a recursive function, so that we can retry 3 or 4 times instead of once.

TheLortex commented 1 year ago

@novemberkilo Looking at existing failures in docs CI, I also see (for example in https://docs.ci.ocaml.org/job/2023-04-10/161235-voodoo-prep-11de8a) Disconnected: Switch turned off. I don't know where it's coming from but won't be handled by the current scheme as it doesn't appear in the logs.

One thing I suspect is the let** _ = Current_ocluster.Connection.run_job ... in body won't execute if run_job returns an error. In that case the error should also be considered as temporary and go in the retry loop.

novemberkilo commented 1 year ago

@TheLortex I have updated the PR along the lines of your feedback. I am now trying to figure out how best to test this and I'm wondering how to mock the various things that go into Prep.build ... or whether refactoring the function so that I can avoid (most of) the mocking is the way to go. I'd appreciate any thoughts you may have on the matter. Thank you :)