`cabal install Agda` fails in `Setup.hs` with `executable-dynamic: True` in config file (Linux)

andreasabel commented 7 months ago

cabal install Agda fails with executable-dynamic: True in config file:

https://github.com/agda/agda/issues/7163
Reproducer on Linux: https://github.com/agda/agda/pull/7172

What is puzzling that cabal install Agda --enable-executable-dynamic (instead of setting in the config file) succeeds: https://github.com/agda/agda/actions/runs/8184356501/job/22378664111?pr=7172

Agda fails in the copyHook in its Setup.hs when it tries to invoke the just-built executable: https://github.com/agda/agda/blob/dee0d2e1300b43ac91517c3a5c58cae6b6af3a3d/Setup.hs#L134-L140 This situation has been discussed elsewhere in this issue tracker:

https://github.com/haskell/cabal/issues/9777

The current issue highlights the discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.

the opposite problem, but for ghc-options: -dynamic: https://github.com/haskell/cabal/issues/6505

alt-romes commented 7 months ago

What is puzzling that cabal install Agda --enable-executable-dynamic (instead of setting in the config file) succeeds: https://github.com/agda/agda/actions/runs/8184356501/job/22378664111?pr=7172

The definition of succeeds here is important: does it succeed in building a dynamically linked executable, or does it succeed because the flag is simply ignored? I'm pretty certain it is the latter, since before #9697 command line options to install are simply ignored (where as options in config will be applied)...

alt-romes commented 7 months ago

The current issue highlights the discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.

So, in light of #9697 (and my previous comment), if this is it then the ticket can be closed. However, why does Agda fail when compiled with --enable-executable-dynamic? Is it because of Cabal? Could you please re-phrase the ticket in terms of why exectuable dynamic makes Agda fail to build?

andreasabel commented 7 months ago

The definition of succeeds here is important: does it succeed in building a dynamically linked executable, or does it succeed because the flag is simply ignored?

Indeed, the flag is ignored. Agda isn't linked in dynamically, see: https://github.com/agda/agda/actions/runs/8213327362/job/22464590517?pr=7172

Two more insights:

on macOS, dynamic linking is not a problem, so the OP does not apply to macOS: https://github.com/agda/agda/actions/runs/8213327362/job/22464590232?pr=7172
on Windows, with executable-dynamic: True, already installing the dependency alex fails: https://github.com/agda/agda/actions/runs/8213327362/job/22464590326?pr=7172#step:9:258
```
Error:     Could not load module ‘Prelude’.
Perhaps you haven't installed the "dyn" libraries for package ‘base-4.19.1.0’?
```

ulidtko commented 6 months ago

Hi @alt-romes,

why does Agda fail when compiled with --enable-executable-dynamic?

It's because:

the just-compiled agda executable gets invoked from Setup hook;
to run the (dynamically-linked) executable, the OS must be able to find .so libraries, including libHSAgda*.so;
at this point of the build, libHSAgda*.so is under a weird path like /home/ulidtko/.cabal/store/ghc-9.6.4/incoming/new-3043174/home/ulidtko/.cabal/store/ghc-9.6.4/Agda-2.6.5-a88d61781dc99162e02d5d6d176b13f1418efc8b30ce081e297f0204e18346d3/lib/libHSAgda-2.6.5-a88d61781dc99162e02d5d6d176b13f1418efc8b30ce081e297f0204e18346d3-ghc9.6.4.so
- this path is cumbersome to even talk about; you noticed it too, in #9654
the weird path can't be found by OS — thus we get the fatal error while loading shared libraries.

The usual ways to specify library path are:

creating a permanent location in /etc/ld.so.conf.d/something.conf;
setting LD_LIBRARY_PATH environment variable temporarily;
adding compiler options to bake appropriate RPATH into the executable;
using patchelf to edit an executable's RPATH post-hoc.

Unfortunately I'm not seeing the ~/.cabal/store/ghc-*/incoming/new-* subdir — is it possible to ask Cabal to not delete it?..

Also notice, that v1-install with --enable-executable-dynamic succeeds on the same package. I.e. this is only an issue in v2-install.

alt-romes commented 6 months ago

@ulidtko Ah, I see.

Ultimately, I don't think this is Cabal's fault, but rather Agda's use of a custom Setup, which will work in some situations but won't be robust to every one (like with executable-dynamic).

I suspect that the reason why it does not work with v2-install is that the assumptions about the system required for the Custom Setup to be successful in the presence of enable-executable-dynamic do not hold true for the nix-style v2 Cabal, for reasons inherit to its design (e.g. the fact that the whole package is built in this temporary directory before being installed to the store atomically).

ulidtko commented 6 months ago

Can Cabal provide perhaps a "post-atomic-store-install" hook, to enable use-cases like Agda's?

alt-romes commented 6 months ago

Can Cabal provide perhaps a "post-atomic-store-install" hook, to enable use-cases like Agda's?

I don't think the problem would be solved by that. I just meant to say that some assumptions made by the Agda custom Setup may not hold true in nix-style Cabal, and provide examples of things that changed -- I was not saying that that particular thing is the problem.

I would have to investigate.

ulidtko commented 6 months ago

@alt-romes for your convenience, I've minimized a test case that reproduces the issue.

Find it here https://gist.github.com/ulidtko/fd65f7e5d486744d0f70dbe1934c6894

If anyone gives me a pointer to cabal tests perhaps, I can try to integrate it as a proper test case.

ulidtko commented 6 months ago

Besides the install itself failing, leave that aside for now @alt-romes — notice from @andreasabel OP

[...] discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.

— I do reproduce this, too (on cabal 3.10.2.1). See gist. --enable-executable-dynamic has no effect — only executable-dynamic: True in ~/.cabal/config does.

I believe just this detail is way beyond expected, and is the actual subject of this bug report.

alt-romes commented 6 months ago

I will take a look at the reproducer later, thanks.

As for the comment regarding that bit of the OP: I've replied in this thread explaining that I've already fixed that discrepancy in a recent pull request: https://github.com/haskell/cabal/issues/9784#issuecomment-1983638565

(That part is outdated)

alt-romes commented 6 months ago

I took a look at this.

The cause for the failure is that the installation strategy of v2 cabal is currently incompatible with running dynamically linked executables in the post install hook in not (yet) supporting relocatable builds (perhaps Agda shouldn't run the executable in a post-install step).

I think we may be able to fix in Cabal by making the installation slightly more relocatable (an alternative is for Agda to work around this limitation manually, but it'd be pretty ugly :), it depends on whether the solution gets merged which may be a bit risky, but I think could be worth it -- then again, it may not be wise to do a risky change like this to support a workflow which is not decidely something we want cc @gbaz )

Here's what happens:

A package to be installed is prepared in a temporary directory (that long path you pasted)
When the package is prepared it will be copied atomically to the store (if curious, see the comment at the beginning of Distribution.Client.Store)
A dynamically linked executable will depend on dynamic libraries that have to be installed somewhere
At present time, this "somewhere" is in /absolute/to/<cabal-store>/<package>/lib (I don't know exactly who sets the library path). Furthermore the runtime path is extended by cabal with an option to the linker e.g. for your repro we have:
```
-optl-Wl,-rpath,/root/.local/state/cabal/store/ghc-9.6.4/cabal9784-repro-0.1-9e8629789f420c50edf74ef0eb19562afbe3114bbee5c8ab57c429597f7c5fc9/lib
```
However, to build the package (i.e. before it is ready to copy to the store), we need to run the post-hook (and thus the executable) first. The problem is that the executable is expecting to find the lib in the store, but the lib has not been copied to the store yet.

What can we do to solve this? Basically, set the library path of the package library something relative to the location of the binary.

Instead of hardcoding the path of the lib to be /<store>/<package>/lib/<lib>.so, we'd set the, in macOS, the path the the lib as @executable_path/../lib/<lib>.so (really, a bit more complicated than this because of all libraries being installed in the same dir on macos due to CMD Load limits), and in Linux as $ORIGIN/../lib/<lib>.so.

But as I'm finished writing this I realize it may not be this simple. The problem is the same problem which makes the path to the temporary directory be that monstrous thing with duplicated directory hiearchy:

Nothing in Cabal guarantees that the storedir is the same as the installdir..., which means that defining the path to the dynamic library relative to the binary may not be possible. This may be a more fundamental limitation than what I had originally thought.

A possible alternative on Agda's side:

Make the Custom Setup program change the install path of the library in the executable to the temporary directory where it was installed before being copied to the store, then run the executable, then change the RPATH of the lib back to the storedir (with install_name_tool on macOS and something else like ldd on linux).

ulidtko commented 6 months ago

Here's what happens: A package to be installed is built in a temporary directory (that long path you pasted)

This is incorrect right away @alt-romes.

The build happens under yet another temporary directory:

/tmp/cabal-install.-3164886/dist-newstyle/tmp/src-3164886/cabal9784-repro-0.1.

BTW this one looks fine to me, not weird in any way. It also fits into 1 line.

alt-romes commented 6 months ago

Sorry, I was not precise and mixed building and installing. The building happens in the builddir, but the installation is prepared in temporary directory that is then copied over to the store:

-- The write-side protocol is:
--
-- * Create a unique temp dir and write all store entry files into it.
--
-- * Take a lock named after the 'UnitId' in question.
--
-- * Once holding the lock, check again for the existence of the final store
--   entry directory. If the entry exists then the process lost the race and it
--   must abandon, unlock and re-use the existing store entry. If the entry
--   does not exist then the process won the race and it can proceed.
--
-- * Register the package into the package db. Note that the files are not in
--   their final location at this stage so registration file checks may need
--   to be disabled.
--
-- * Atomically rename the temp dir to the final store entry location.
--
-- * Release the previously-acquired lock.

I'll update my comment.

alt-romes commented 6 months ago

Actually, your comment reminded me of something else: when you build the executable inplace (in dist-newstyle), the path to the lib will also be fixed to the <storedir>/<package>/lib/<libname>.so (because linking, which determines the RPATHs, is done at build-time).

And... so it is... if you run the executable you have built in place (you can find it with cabal list-bin cabal9784-repro) it will also fail because the dynamic library will not be found (as it has not been installed there...)

I've mixed up some things incorrectly and need to look at this again with a clearer head. This doesn't make a lot of sense.

ulidtko commented 6 months ago

Nothing in Cabal guarantees that the storedir is the same as the installdir...

I understood this from your analysis in #9654 @alt-romes. But this doesn't follow:

which means that defining the path to the dynamic library relative to the binary may not be possible.

— in the most general case, yes. But nothing prevents us from optimizing the common case, right?

Can cabal detect, that none of custom --prefix, --libdir or whatnot was used, and/or that the dependency-lib actually is installed under the store dir? And perhaps based on that, avoid the weird-dir shenanigans.

In defensive programming philosophy, the only guarantees you have — are the ones you've explicitly checked.

alt-romes commented 6 months ago

Oh dear... I've conflated a few too many things in this conversation (I blame tiredness)...

Cabal explicitly extends the RPATH (runtime lib path) of the executable with the path to the cabal store, but the path to the package library is set to something else (probably GHC based on something related to the package database that cabal provides it) -- I'm not very familiar with linux dynamic library loading (unlike with macOS ones), so for example I used ldd to get the dynlib paths and got full paths and was interpreting that as the path to the lib being absolute, but it may be that it is being resolved (or not) in the RPATH? I need to check carefully.

I misjudged and misunderstood a few things while going through this and was a bit hasty. Let me come back tomorrow to this with a clear head.

ulidtko commented 6 months ago

@alt-romes sure. In advance, ldd does output resolved paths. Executables internally refer to libraries abstractly, as simply libHSfoobar-<abi_stuff>.so or libm.so.6 — the exact file paths are the dynamic linker's job to resolve (dyld on OS X, ld.so on Linux) .

These things influence what the dynamic linker's resolution does on Linux:

creating a permanent location in /etc/ld.so.conf.d/something.conf (followed by running ldconfig as root to rebuild caches);
setting LD_LIBRARY_PATH environment variable temporarily;
adding compiler options to bake appropriate RPATH / RUNPATH into the executable;
using patchelf to edit an executable's RPATH / RUNPATH post-hoc.

You can use otool -l or objdump -x | grep 'R.*PATH' to print RPATH baked into an executable (resp. on OS X & Linux). It's usually a :-separated list. The linked stackoverflow thread, as well as Apple's man 1 dyld, explain variables like @loader_path in RPATH that dyld interprets.

andreasabel commented 6 months ago

@ulidtko wrote:

If anyone gives me a pointer to cabal tests perhaps, I can try to integrate it as a proper test case.

E.g. there is an example how to make a test in one of @alt-romes PRs: https://github.com/haskell/cabal/pull/9762/files#diff-6db13a6ade63e9c657436ae3057120fa9b5fbc52fbc3b79e8d7574722f28f7b0

(But make sure you sync with him on this, maybe Rodrigo will add it in course of a fix of this issue.)

alt-romes commented 6 months ago

I've minimised the reproducer a bit (for Linux):

It suffices to have a cabal package with both a library and an executable, use Custom build-type with defaultMain, and build with executable-dynamic: True.

We don't even need to get to the install step, simply running the built executable in dist-newstyle (or using cabal run) is enough to trigger the same error.

I'm convinced this is a Cabal bug (which I can reproduce in master). I'm trying to figure out why this works on macOS.

alt-romes commented 6 months ago

Why does this not fail on macOS?

Cabal invokes GHC to link the executable, and, on macOS, GHC will add RPATH entries to the executable pointing to libraries used by the executable (see runInjectRPaths on the GHC source). I think that this Cabal bug not occurring on macOS is only a fortunate coincidence caused by GHC having some special rpath logic on macos.

Why does this not fail if build-type: Simple?

When using build-type: Simple, cabal extends the executable RPATH with the path to the build directory via the GHC linker invocation:

-optl-Wl,-rpath,/opt/ghc/9.6.4/lib/ghc-9.6.4/lib/aarch64-linux-ghc-9.6.4
-optl-Wl,-rpath,/what/dist-newstyle/build/aarch64-linux/ghc-9.6.4/what-0.1.0.0/build

However, when using build-type: Custom we don't, instead we pass the path to the cabal store and to ghc's libdir:

-optl-Wl,-rpath,/root/.cabal/lib/aarch64-linux-ghc-9.6.4
-optl-Wl,-rpath,/opt/ghc/9.6.4/lib/ghc-9.6.4/lib/aarch64-linux-ghc-9.6.4

Where to look now?

This narrows it down: I suspect depLibraryPaths...

alt-romes commented 6 months ago

It seems that it is expected for an executable which depends on an internal lib built with Custom setup/legacy fallback and executable-dynamic: True to not include the build directory in the RPATH of the executable, but rather that the executable should only expect to find that internal library when it is installed in the cabal store. See the test added by a1107e21f:

+  -- Test that we don't accidentally add the inplace directory to
+  -- an executable RPATH.  Don't test on Windows, which doesn't
+  -- support RPATH.
+  unlessWindows $ do
+    tc "Regression/T4025" $ do
+      cabal "configure" ["--enable-executable-dynamic"]
+      cabal "build" []
+      -- This should fail as it we should NOT be able to find the
+      -- dynamic library for the internal library (since we didn't
+      -- install it).  If we incorrectly encoded our local dist
+      -- dir in the RPATH, this will succeed.
+      shouldFail $ runExe' "exe" []

On one hand I understand this point, but on the other hand it makes it impossible for packages with Custom setups to run the built executables in the copy phase, as the dynamic library will only be available where it is expected to be after it is installed in the store.

ulidtko commented 6 months ago

I don't entirely follow #4033 on the first read. :confused:

@alt-romes speaking of phases, what's the sequence? build, copy, register, install ?

For context, what Agda does: it's a compiler that ships with a bunch of wired-in internal modules, a-la GHC.Prim. For importing, these modules must first be typechecked, producing .agdai files, similar to .hi or .pyc. Now, it's a crucial step to typecheck these wired-in modules during agda build — otherwise, there's possibility of breakage in wired-in modules to slip through, and install broken compiler.

Another reason is efficiency. Also, keeping permission boundaries: when run later by user, the compiler might no longer have permissions to write the .agdai files where they belong, if installed from a .deb package for example.

That's why that custom setup hook is there. I don't think it's particularly important for Agda in which phase of cabal install it runs. The only expected outcomes are:

that typechecking runs, and produces the .agdai outputs;
that the .agdai outputs are added to the package's data-files.

So maybe, it shouldn't be a hook in the copy phase, but some other phase?

alt-romes commented 6 months ago

I've briefly discussed this with Duncan, and the summary of our discussion is:

The RPATH of the executable should indeed be set for the final installation directory
cabal run should successfully run the in-place executable by augmenting LD_LIBRARY_PATH to account for the build directory
Every post-build setup-script action should be invoked with LD_LIBRARY_PATH augmented with the inplace directory where the libs live

haskell / cabal

`cabal install Agda` fails in `Setup.hs` with `executable-dynamic: True` in config file (Linux) #9784