ocurrent / ocaml-ci

A CI for OCaml projects
https://ocaml.ci.dev
MIT License
111 stars 74 forks source link

Analysis step in a transient state #685

Open maiste opened 1 year ago

maiste commented 1 year ago

Sometimes, when running the analysis step, it wouldn't rebuild, get stuck and would always show the same error message. To solve that locally, the solution is to clean the local state and rerun the CI.

It needs more investigation but it could be solved by the solver-service.

tmcgilchrist commented 1 year ago

An example from ocaml/ocaml-file-format https://ci.ocamllabs.io:8100/job/2022-12-13/165403-git-fetch-7502d0

2022-12-13 16:54.03: New job: git fetch https://github.com/ocaml/opam-file-format.git#refs/pull/47/head (91af0c0e6148bda1e268c35dc8f62159d254649e)
2022-12-13 16:54.03: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "branch" "-f" "fetch-91af0c0e6148bda1e268c35dc8f62159d254649e" 
                           "91af0c0e6148bda1e268c35dc8f62159d254649e"
fatal: Not a valid branch point: '91af0c0e6148bda1e268c35dc8f62159d254649e'.
2022-12-13 16:54.03: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "fetch" "--recurse-submodules=false" "-q" 
                           "-f" "https://github.com/ocaml/opam-file-format.git" 
                           "refs/pull/47/head"
2022-12-13 16:54.03: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "branch" "-f" "fetch-91af0c0e6148bda1e268c35dc8f62159d254649e" 
                           "91af0c0e6148bda1e268c35dc8f62159d254649e"
fatal: Not a valid branch point: '91af0c0e6148bda1e268c35dc8f62159d254649e'.
2022-12-13 16:54.03: Job failed: Command "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
"branch" "-f" "fetch-91af0c0e6148bda1e268c35dc8f62159d254649e" "91af0c0e6148bda1e268c35dc8f62159d254649e" exited with status 128

The git commit in question does not belong to any branch on this repository, and may belong to a fork outside of the repository. Looking at the PR for that job https://github.com/ocaml/opam-file-format/pull/47 that commit is available.

Rerunning the fetch step just now gives:

2022-12-13 22:08.46: New job: git fetch https://github.com/ocaml/opam-file-format.git#refs/pull/47/head (91af0c0e6148bda1e268c35dc8f62159d254649e)
2022-12-13 22:08.46: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "branch" "-f" "fetch-91af0c0e6148bda1e268c35dc8f62159d254649e" 
                           "91af0c0e6148bda1e268c35dc8f62159d254649e"
fatal: Not a valid branch point: '91af0c0e6148bda1e268c35dc8f62159d254649e'.
2022-12-13 22:08.46: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "fetch" "--recurse-submodules=false" "-q" 
                           "-f" "https://github.com/ocaml/opam-file-format.git" 
                           "refs/pull/47/head"
2022-12-13 22:08.47: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "branch" "-f" "fetch-91af0c0e6148bda1e268c35dc8f62159d254649e" 
                           "91af0c0e6148bda1e268c35dc8f62159d254649e"
2022-12-13 22:08.47: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "reset" "--hard" "-q" "91af0c0e6148bda1e268c35dc8f62159d254649e"
2022-12-13 22:08.47: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "submodule" "sync"
2022-12-13 22:08.47: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "submodule" "deinit" "--force" "--all"
2022-12-13 22:08.47: Exec: "git" "-C" "/var/lib/ocurrent/var/git/opam-file-format.git-86eb760e672d02d33591ec88eb8e890f599af36acbd25c5eac00005e3f372f69" 
                           "submodule" "update" "--recursive" "--init"
2022-12-13 22:08.47: Job succeeded
novemberkilo commented 1 year ago

Sometimes this requires that the worker's cache be wiped. Rebuilding a step from the UI results in the same worker being picked (unless it has been paused or taken down). Is it possible to improve the results of a rebuild by choosing a different worker for the rebuild?

mtelvers commented 1 year ago

This might occasionally make the rebuild work where it previously failed, but on the whole, this would be a significant performance hit. Better to try to detect the issue on the worker and get it to fix/delete the cache. Such as what @kit-ty-kate did here https://github.com/ocurrent/ocluster/commit/318a69ffabbb8424fdcaa33168b1bf47ade803ec