Closed proux01 closed 1 year ago
@erikmd apparently the job failed on a timeout https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/802360 (at least 8.16 timeouted at some point after solvable (so field and character are missing) and 8.17 was "Killed"?). It is a currently known drawback of MC 2.0 that it's compilation is slow [1]. Do you think you could extend the timeout? (considering the progress after 1 hour, I'd expect 2 to be enough)
[1] https://github.com/math-comp/math-comp/releases/tag/mathcomp-2.0.0
Dear @proux01, thanks for the ping.
Indeed, there were two errors in https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/802360/failures:
Job deploy_2_2.0.0-coq-8.16: ERROR: Job failed: execution took longer than 1h0m0s seconds
→ I bumped the project timeout to 2h, and the retried job passed:
https://gitlab.inria.fr/math-comp/docker-mathcomp/-/jobs/3110825 (78')
Job deploy_1_2.0.0-coq-8.17: ERROR: failed to solve: rpc error: code = Unknown desc = failed to solve with frontend dockerfile.v0: failed to create LLB definition: unexpected status code [manifests 8.16]: 401 Unauthorized
→ this looked like a network transient error (I might extend docker-keeper with a 3-times retry loop to avoid this)
but the bad news is that after yet another retry with the 2h timeout, we get this error: https://gitlab.inria.fr/math-comp/docker-mathcomp/-/jobs/3110824
#5 1393.0 # COQC order.v
#5 1393.0 # Killed
…
ERROR: failed to solve: executor failed running [/bin/bash --login -o pipefail -c set -x && … && opam install -y -j "${NJOBS}" ${MATHCOMP_PACKAGE} && opam clean -a -c -s --logs && opam config list && opam list]:
exit code: 31
Does this error look like an OOM error to you? Cc @CohenCyril @Zimmi48 @palmskog @AltGr
(Meanwhile, I started a full rebuild of docker-coq images, to avoid the step:
#5 45.56 - recompile ppxlib 0.29.1 [upstream or system changes]
)
Yes looks like an OOM, according to release notes https://github.com/math-comp/math-comp/releases/tag/mathcomp-2.0.0
some huge files (e.g., order.v) require 2.5G of memory to compile
Could also be a disk space issue?
OK thanks for the feedback!
So I replaced the medium
runners with large
ones.
Let's monitor this pipeline now: https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/804156
Thanks, still there is a worrying memory issue on 8.17:
@gares any idea what could cause that?
Thanks @proux01; to sum up:
→ this looked like a network transient error (I might extend docker-keeper with a 3-times retry loop to avoid this)
FWIW, coqbot also has support for detecting some transient errors and automatically retry them. It is trivial to extend it to handle this case. It is "safe" in the sense that there is a limit to the number of automatic retries allowed (it used not to be the case).
https://github.com/coq/bot/blob/a6605542c2344a54859af39284c9149b388563d2/src/actions.ml#L510-L551
@affeldt-aist @erikmd I don't have the right to merge here so please go ahead