ocurrent / ocaml-ci

A CI for OCaml projects
https://ocaml.ci.dev
111 stars 75 forks source link

debian-12 x86_32 workers fail due to old capnproto version #931

Closed shonfeder closed 1 month ago

shonfeder commented 3 months ago

An example of a failing build can be seen at https://ocaml.ci.dev/github/ocurrent/docker-base-images/commit/d91f3800a8897fd11753fd312f5ea11f240363bd/variant/debian-12-4.14_x86_32_opam-2.1/-/refs/pull/274/head

The relvent error is

# *** Uncaught exception ***
# kj/filesystem-disk-unix.c++:305: failed: ::fstat(fd, &stats): Value too large for defined data type

@benmandrew noted that this is the same problem discussed at https://github.com/mirage/capnp-rpc/issues/273

@mtelvers pointed out that the problem here is simply that package repo for debian 12 has not updated to use the fixed capnproto version (e.g., 1.0.2) https://tracker.debian.org/pkg/capnproto .

I'm trying to get that update moved along, but in the meantime I believe we can fix this by installing the latest release in our images before running CI on them. Iiuc, https://github.com/ocurrent/ocaml-ci/pull/894 would have fixed the problem while waiting for the upstream updates, but it was closed because

it is preferable to not add patched system libraries in the CI

as per https://github.com/ocurrent/ocaml-ci/pull/894#issuecomment-1808344120

However, at this point a proper release is out, and in my view it is preferable to special case install of a package rather than to leave builds red, so I will prep a PR following that example but using the latest install.

shonfeder commented 3 months ago

Thanks to the debian IRC, TIL:

package versions are frozen for every debian stable release, so unless the fix can be backported non-destructively to 0.9.2, bookworm will not get it

So if we want to continue support this build target,our (non-exclusive) options are:

  1. Install the latest version in our CI pipeline
  2. Work towards, and wait for, a backport to 0.9.2

However, we have now been letting this build fail for months, iiuc, and it seems like everything is moving along ok. So maybe the best option is:

  1. Just remove the tests for debian-12 on x86_32.

WDYT, @mtelvers?

mtelvers commented 3 months ago

These tests only fail for projects which use capnproto as that is the outdated package. Therefore, I do not think we should disable the test as it will work on most projects.

shonfeder commented 3 months ago

Thanks, @mtelvers! Your comment here and discussion in slack helped me realize I mistook the scope of CIs effected by this error (I thought it was a problem with service infra running, rather than just a problem with the CI for our infra -- didin't look at the context of the failure enough obviously :)).

After discussion with Mark, I think the next place to look would be at the depext package: we can either install a workable version of capnproto (if its permissable to bypass the package manager) put an OS/arch conflict.

shonfeder commented 3 months ago

I've checked with the opam repo maintainers to be sure, and (Kate) confirmed there is no way to bypass the distro package manager in depext. So I think really our only option is either figuring how to ignore this target in our CI or how to get this dependency installed on the target.

shonfeder commented 3 months ago

Once deployed, #932 should remove the specious build failures, but the underlying problem persists. So I'm leaving this issue open to note the problem, even tho we don't have a clear way forward.

shonfeder commented 1 month ago

Not sure why it took me so long to hit on this solution, but for any project where this is an issue, just mark it as unavailable on debian-12 on x86_32 as per https://github.com/ocurrent/ocurrent-deployer/pull/223/commits/b60799f2acae3ee47e5c152c45cdcbf243d1d2d1

I think we wouldn't want this in the capnp package itself, because users can use the lib on that target with no problems so long as they install a more recent version of capnp first.