math-comp / docker-mathcomp

Docker images of coq-mathcomp [maintainer=@erikmd]
https://hub.docker.com/r/mathcomp/mathcomp/#supported-tags
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Add mathcomp-2.0.0 #25

Closed proux01 closed 1 year ago

proux01 commented 1 year ago

@affeldt-aist @erikmd I don't have the right to merge here so please go ahead

proux01 commented 1 year ago

@erikmd apparently the job failed on a timeout https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/802360 (at least 8.16 timeouted at some point after solvable (so field and character are missing) and 8.17 was "Killed"?). It is a currently known drawback of MC 2.0 that it's compilation is slow [1]. Do you think you could extend the timeout? (considering the progress after 1 hour, I'd expect 2 to be enough)

[1] https://github.com/math-comp/math-comp/releases/tag/mathcomp-2.0.0

erikmd commented 1 year ago

Dear @proux01, thanks for the ping.

Indeed, there were two errors in https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/802360/failures:

Does this error look like an OOM error to you? Cc @CohenCyril @Zimmi48 @palmskog @AltGr

erikmd commented 1 year ago

(Meanwhile, I started a full rebuild of docker-coq images, to avoid the step: #5 45.56 - recompile ppxlib 0.29.1 [upstream or system changes])

proux01 commented 1 year ago

Yes looks like an OOM, according to release notes https://github.com/math-comp/math-comp/releases/tag/mathcomp-2.0.0

some huge files (e.g., order.v) require 2.5G of memory to compile

proux01 commented 1 year ago

Could also be a disk space issue?

erikmd commented 1 year ago

OK thanks for the feedback!

So I replaced the medium runners with large ones.

Let's monitor this pipeline now: https://gitlab.inria.fr/math-comp/docker-mathcomp/-/pipelines/804156

proux01 commented 1 year ago

Thanks, still there is a worrying memory issue on 8.17:

@gares any idea what could cause that?

erikmd commented 1 year ago

Thanks @proux01; to sum up:

Zimmi48 commented 1 year ago

→ this looked like a network transient error (I might extend docker-keeper with a 3-times retry loop to avoid this)

FWIW, coqbot also has support for detecting some transient errors and automatically retry them. It is trivial to extend it to handle this case. It is "safe" in the sense that there is a limit to the number of automatic retries allowed (it used not to be the case).

https://github.com/coq/bot/blob/a6605542c2344a54859af39284c9149b388563d2/src/actions.ml#L510-L551