ocaml / opam

opam is a source-based package manager. It supports multiple simultaneous compiler installations, flexible package constraints, and a Git-friendly development workflow.
https://opam.ocaml.org
Other
1.25k stars 365 forks source link

`opam pin` dies without message and exit code 120 in snapcraft / multipass / LXD build #5432

Open MSoegtropIMC opened 1 year ago

MSoegtropIMC commented 1 year ago

I have a rather mysterious issue: since an update of LXD (I think - it is the only obvious change) an opam pin command dies with exit code 120 in the Coq Platform CI. This did run daily for years without issue and suddenly stopped working a week ago and since then fails every day on exactly the same opam pin command. Opam ist the same, all packages are the same - as far as I can see just the version of LXD changed.

What I do is essentially:

The first of these pins succeeds, the second one dies without message and exit code 120 (see log snippets below).

Exit code 120 is not documented for opam here, but there are a few non standard error codes in the 12X range, so I thought maybe it means something to someone here.

In the end it is likely a weird LXD bug, but I can't really report there without understanding how and why opam fails.

Here is a good and a bad log (both reproduced identically many times):

GOOD

<Got here after a 5 hours opam install command>
PINNING PIN.ocamlfind.1.9.5~relocatable
[WARNING] Running as root is not recommended
ocamlfind is now pinned to version 1.9.5~relocatable

No package build needed.
Nothing to do.
# Run eval $(opam env) to update the current shell environment
PINNING PIN.coq.8.16.1
[WARNING] Running as root is not recommended
coq is now pinned to version 8.16.1

Already up-to-date.
Nothing to do.
# Run eval $(opam env) to update the current shell environment

BAD

<Got here after a 5 hours opam install command - all identical to the good case>
PINNING PIN.ocamlfind.1.9.5~relocatable
ocamlfind is now pinned to version 1.9.5~relocatable

No package build needed.
Nothing to do.
# Run eval $(opam env) to update the current shell environment
PINNING PIN.coq.8.16.1
===============================================================================
An error occurred when trying to execute 'snapcraft snap' with 'LXD': returned exit code 120.

One interesting difference is that the usual "[WARNING] Running as root is not recommended" message is missing in the bad case, even though it is issues by the opam install command immediately before (and run by the same script). Note that it is missing in both, the first succeeding and the second failing pin.

MSoegtropIMC commented 1 year ago

I did some more research. Opam fails reliably in all snap provided versions of LXD published after mid January (all current streams are affected).

This is obviously an LXD issue, but for reporting there I would really like to understand if the error 120 comes from opam and if so, what it means. I would really appreciate some help here.

dra27 commented 1 year ago

We're scratching our heads, too! There's definitely no 120 exit code for opam (and there shouldn't be an error message of that form). If a subcommand had failed then there should be some kind of opam error report. Is opam definitely running at this point? Perhaps the easiest way to be sure would be to add --debug to the invocations of opam just as that would cause more terminal output to appear (I agree with your suspicion at the lack of the root warning)

MSoegtropIMC commented 1 year ago

@dra27: thanks for the suggestion - I will try it with --debug. I also setup LXD on a local machine, so that I can more easily debug this (look at apparmor logs and the like).

I am btw. not sure where the error 120 comes from. The involved parties are opam, lxd and snapcraft. The error could be issues by any of them, but it is documented for none of them. Possibly it is a linux standard error code, but I must admit I have no clue what a Linux error 120 is supposed to mean (120 EISNAM Is a named type file ).

In the end I will probably use multipass instead of LXD (an alternative VM manager which snapcraft can use). Actually Multipass is the default, but the snapcraft supplied GitHub action for building snaps insists on using LXD, and it is a bit of work to change this.

The really odd thing about all this is that opam runs happily for hours and compiles almost 100 packages and then dies with a pin - and even more odd with the second pin.