ocurrent / ocaml-ci

A CI for OCaml projects
https://ocaml.ci.dev
MIT License
112 stars 74 forks source link

CI failure #873

Closed hannesm closed 1 year ago

hannesm commented 1 year ago

As you mentioned in #858, the CI service is considered to be stable.

Now, I just observed some failure at: https://ocaml.ci.dev/github/robur-coop/albatross/commit/2f316d2e49866fe08b9e12c12194062bbbaa2329/variant/debian-12-5.0_opam-2.1 -- and I've seen similar logs before, so maybe there's a way to tackle the root cause.

Since the CI sometimes removes all the logs, I paste below the entire log from the link above:

2023-09-12 11:12.42: New job: test robur-coop/albatross https://github.com/robur-coop/albatross.git#refs/heads/main (2f316d2e49866fe08b9e12c12194062bbbaa2329) (linux-x86_64:debian-12-5.0_opam-2.1)

Base: ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea

Opam project build

To reproduce locally:

git clone --recursive "https://github.com/robur-coop/albatross.git" -b "main" && cd "albatross" && git reset --hard 2f316d2e

cat > Dockerfile <<'END-OF-DOCKERFILE'

FROM ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea

# debian-12-5.0_opam-2.1

USER 1000:1000

ENV CLICOLOR_FORCE="1"

ENV OPAMCOLOR="always"

WORKDIR /src

RUN sudo ln -f /usr/bin/opam-2.1 /usr/bin/opam

RUN opam init --reinit -ni

RUN opam exec -- ocaml -version && opam --version

WORKDIR /src

RUN sudo chown opam /src

RUN cd ~/opam-repository && (git cat-file -e 95ff62cd8c4b49edfe81945606a015c8005774ae || git fetch origin master) && git reset -q --hard 95ff62cd8c4b49edfe81945606a015c8005774ae && git log --no-decorate -n1 --oneline && opam update -u

COPY --chown=1000:1000 albatross.opam ./

RUN opam pin add -yn albatross.dev './'

ENV DEPS="alcotest.1.7.0 angstrom.0.15.0 asn1-combinators.0.2.6 astring.0.8.5 base-bigarray.base base-bytes.base base-domains.base base-nnp.base base-threads.base base-unix.base base64.3.5.1 bigstringaf.0.9.1 bos.0.2.1 ca-certs.0.2.3 checkseum.0.5.1 cmdliner.1.2.0 conf-gmp.4 conf-gmp-powm-sec.3 conf-libnl3.1 conf-pkg-config.3 cppo.1.6.9 csexp.1.5.2 cstruct.6.2.0 decompress.1.5.2 dns.7.0.3 dns-client.7.0.3 dns-client-lwt.7.0.3 domain-name.0.4.0 dune.3.10.0 dune-configurator.3.10.0 duration.0.2.1 eqaf.0.9 faraday.0.8.2 faraday-lwt.0.8.2 faraday-lwt-unix.0.8.2 fmt.0.9.0 fpath.0.7.3 gmap.0.3.0 h2.0.10.0 happy-eyeballs.0.6.0 happy-eyeballs-lwt.0.6.0 hex.1.5.0 hkdf.1.0.4 hpack.0.10.0 http-lwt-client.0.2.5 httpaf.0.7.1 ipaddr.5.5.0 logs.0.7.0 lru.0.3.1 lwt.5.7.0 macaddr.5.5.0 metrics.0.4.1 metrics-influx.0.4.1 metrics-lwt.0.4.1 metrics-rusage.0.4.1 mirage-crypto.0.11.1 mirage-crypto-ec.0.11.1 mirage-crypto-pk.0.11.1 mirage-crypto-rng.0.11.1 mirage-crypto-rng-lwt.0.11.1 mtime.2.0.0 ocaml.5.0.0 ocaml-base-compiler.5.0.0 ocaml-config.3 ocaml-options-vanilla.1 ocaml-syntax-shims.1.0.0 ocamlbuild.0.14.2 ocamlfind.1.9.6 ocplib-endian.1.2 optint.0.3.0 owee.0.7 pbkdf.1.2.0 psq.0.2.1 ptime.1.1.0 randomconv.0.1.3 re.1.11.0 result.1.5 rresult.0.7.0 seq.base sexplib0.v0.16.0 solo5-elftool.0.3.1 stdlib-shims.0.3.0 tls.0.17.1 tls-lwt.0.17.1 topkg.1.0.7 uutf.1.0.3 x509.0.16.5 zarith.1.13"

ENV CI="true"

ENV OCAMLCI="true"

RUN opam update --depexts && opam install --cli=2.1 --depext-only -y albatross.dev $DEPS

RUN opam install $DEPS

COPY --chown=1000:1000 . /src

RUN opam exec -- dune build @install @check @runtest && rm -rf _build

END-OF-DOCKERFILE

docker build .

END-REPRO-BLOCK

2023-09-12 11:12.42: Using cache hint "robur-coop/albatross-ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea-debian-12-5.0_opam-2.1-d7790a8e8307b6c95dc48b4264ef6628"

2023-09-12 11:12.42: Using OBuilder spec:

((from ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea)

 (comment debian-12-5.0_opam-2.1)

 (user (uid 1000) (gid 1000))

 (env CLICOLOR_FORCE 1)

 (env OPAMCOLOR always)

 (workdir /src)

 (run (shell "sudo ln -f /usr/bin/opam-2.1 /usr/bin/opam"))

 (run (shell "opam init --reinit -ni"))

 (run (shell "opam exec -- ocaml -version && opam --version"))

 (workdir /src)

 (run (shell "sudo chown opam /src"))

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "cd ~/opam-repository && (git cat-file -e 95ff62cd8c4b49edfe81945606a015c8005774ae || git fetch origin master) && git reset -q --hard 95ff62cd8c4b49edfe81945606a015c8005774ae && git log --no-decorate -n1 --oneline && opam update -u"))

 (copy (src albatross.opam) (dst ./))

 (run (network host)

      (shell "opam pin add -yn albatross.dev './'"))

 (env DEPS "alcotest.1.7.0 angstrom.0.15.0 asn1-combinators.0.2.6 astring.0.8.5 base-bigarray.base base-bytes.base base-domains.base base-nnp.base base-threads.base base-unix.base base64.3.5.1 bigstringaf.0.9.1 bos.0.2.1 ca-certs.0.2.3 checkseum.0.5.1 cmdliner.1.2.0 conf-gmp.4 conf-gmp-powm-sec.3 conf-libnl3.1 conf-pkg-config.3 cppo.1.6.9 csexp.1.5.2 cstruct.6.2.0 decompress.1.5.2 dns.7.0.3 dns-client.7.0.3 dns-client-lwt.7.0.3 domain-name.0.4.0 dune.3.10.0 dune-configurator.3.10.0 duration.0.2.1 eqaf.0.9 faraday.0.8.2 faraday-lwt.0.8.2 faraday-lwt-unix.0.8.2 fmt.0.9.0 fpath.0.7.3 gmap.0.3.0 h2.0.10.0 happy-eyeballs.0.6.0 happy-eyeballs-lwt.0.6.0 hex.1.5.0 hkdf.1.0.4 hpack.0.10.0 http-lwt-client.0.2.5 httpaf.0.7.1 ipaddr.5.5.0 logs.0.7.0 lru.0.3.1 lwt.5.7.0 macaddr.5.5.0 metrics.0.4.1 metrics-influx.0.4.1 metrics-lwt.0.4.1 metrics-rusage.0.4.1 mirage-crypto.0.11.1 mirage-crypto-ec.0.11.1 mirage-crypto-pk.0.11.1 mirage-crypto-rng.0.11.1 mirage-crypto-rng-lwt.0.11.1 mtime.2.0.0 ocaml.5.0.0 ocaml-base-compiler.5.0.0 ocaml-config.3 ocaml-options-vanilla.1 ocaml-syntax-shims.1.0.0 ocamlbuild.0.14.2 ocamlfind.1.9.6 ocplib-endian.1.2 optint.0.3.0 owee.0.7 pbkdf.1.2.0 psq.0.2.1 ptime.1.1.0 randomconv.0.1.3 re.1.11.0 result.1.5 rresult.0.7.0 seq.base sexplib0.v0.16.0 solo5-elftool.0.3.1 stdlib-shims.0.3.0 tls.0.17.1 tls-lwt.0.17.1 topkg.1.0.7 uutf.1.0.3 x509.0.16.5 zarith.1.13")

 (env CI true)

 (env OCAMLCI true)

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "opam update --depexts && opam install --cli=2.1 --depext-only -y albatross.dev $DEPS"))

 (run (cache (opam-archives (target /home/opam/.opam/download-cache)))

      (network host)

      (shell "opam install $DEPS"))

 (copy (src .) (dst /src))

 (run (shell "opam exec -- dune build @install @check @runtest && rm -rf _build"))

)

2023-09-12 11:12.42: Waiting for resource in pool OCluster

2023-09-12 11:12.42: Waiting for worker…

2023-09-12 11:14.23: Got resource from pool OCluster

Building on x86-bm-c4.sw.ocaml.org

All commits already cached

HEAD is now at 2f316d2 opam: add fpath dependency explicitly

(from ocaml/opam@sha256:e52acfdc43defaa996da6843a61654d49d25a74409356d3ce748bd8fc801adea)

2023-09-12 11:14.24 ---> using "f0d5e9b94774e5249d9626ec509c9051e626e9fa00cb03ff73f2cb6e7eb5228b" from cache

Uncaught exception: Sys_error("/var/cache/obuilder/result/f0d5e9b94774e5249d9626ec509c9051e626e9fa00cb03ff73f2cb6e7eb5228b/env: No such file or directory")

2023-09-12 11:14.24: Job failed: Failed: Internal error

And - as reported earlier, pasting from the Web UI is bad (it injects lots of newlines). I thought you had fixed that issue, but it looks like there's a regression.

mtelvers commented 1 year ago

Thank you for reporting this issue. I have made a preliminary investigation: env contains the environment variables and is extracted from the Docker base image using docker image inspect and saving .Config.Env to a file. This file is missing because the worker exited with a fatal exception while the image was being extracted about half an hour earlier. Investigating that issue showed that the worker was running low on disk space and needed to prune virtually everything from the cache. The prune operation removed a cached layer, which was a dependency of a running job. The delete cascaded to the child layers, which could not be removed as it was in use, therefore causing the exception. The selection of items to be pruned is made by considering all cache layers ordered by time last used and which are older than 10 minutes. In this case, the 10-minute window was insufficient. I will look into this further tomorrow.

edwintorok commented 1 year ago

FWIW I had a similar failure on my repo, and the way I worked it around is by pushing and empty commit git commit -am "bump for ci" --allow-empty (this still took advantage of most existing caching but got the broken worker out of this situation), otherwise simply restarting builds didn't help, it kept failing with same error (and yes I did notice an out of space earlier which affected both opam CI and ocaml CI).

hannesm commented 1 year ago

I will close this issue, since there has been some commit to "ocurrent/obuilder" that may solve this issue.