Open art-w opened 1 year ago
Thanks for the report. IIRC these are cached results (and thus cached logs) so the queue- and run-time in the header are correct, but it would be ideal to mark them as cached to prevent confusion. I seem to remember this was a surprisingly hard problem to solve @novemberkilo?
Thanks @art-w -- as @benmandrew points out, unfortunately this is a known issue. Will add it to our list of fixes/enhancements. // @tmcgilchrist
This issue is related to the work I did when connecting ocaml-ci and solver-service. Because we're send 2 different requests to OCluster, this is why we start immediately the job. Otherwise using Cluster connection
, the job will be started twice so ends up failing.
Making those 2 different requests in one type of request to be sent to the solver-service can solve this issue.
Making those 2 different requests in one type of request to be sent to the solver-service can solve this issue.
Instead of having to upgrade the solver-service API each time we add a different request, it is preferable to have a pool for analysis in which all the different requests is sent at different time to the solver-service using the same API. Some selections like ocamlformat
, opam-dune-lint
have been got with different requests to the solver-service during the analysis job.
This PR https://github.com/ocurrent/ocaml-ci/pull/888 fixes the issue at line https://github.com/ocurrent/ocaml-ci/blame/b3c3facfe0e1e1e18dfd0389827f555908c1ee0b/lib/analyse.ml#L253, where the pool
was removed at some point.
@art-w would you like to confirm the fix ?
@benmandrew this could be closed, I think.
Oh sorry @moyodiallo I didn't see your message! I'm not sure I understand the technical details, beside the 0s timings being related to the cache (and so it's obviously hard to fix)... So without digging into the code, I had a look at the latest commit on ocurrent
which shows a bunch of tasks with 0s duration: https://ocaml.ci.dev/github/ocurrent/ocurrent/commit/8e0b9d4bb348b13df8696fe63feba303b9a476fd (I don't know if the CI is running your fix though!)
(also I understand that there were other issues related to cluster jobs which were higher priorities, I don't think the run duration is critical for end-users, it's a bit confusing but otherwise a minor issue)
@art-w you are correct, the issue is related to the ocurrent cache, not the cluster connection. This issue still exists as you saw.
Sorry guys (@benmandrew, @art-w) I mixed it, I solved another issue and thinking it is related. The issue I solved is when all the analysis jobs start with 0s in queue and lot of them keep waiting at some point.
Context
I don't understand the 0s timings displayed by the CI header on some jobs. The example link is coming from https://github.com/ocurrent/current-bench/pull/438/checks?check_run_id=13718052712 which has other jobs with understandable timings.
Steps to reproduce
Expected behaviour
16min in queue (or 1h32min?) Ran for 16min