Open andreasabel opened 7 months ago
What is puzzling that cabal install Agda --enable-executable-dynamic (instead of setting in the config file) succeeds: https://github.com/agda/agda/actions/runs/8184356501/job/22378664111?pr=7172
The definition of succeeds here is important: does it succeed in building a dynamically linked executable, or does it succeed because the flag is simply ignored? I'm pretty certain it is the latter, since before #9697 command line options to install are simply ignored (where as options in config will be applied)...
The current issue highlights the discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.
So, in light of #9697 (and my previous comment), if this is it then the ticket can be closed. However, why does Agda fail when compiled with --enable-executable-dynamic
? Is it because of Cabal? Could you please re-phrase the ticket in terms of why exectuable dynamic makes Agda fail to build?
The definition of succeeds here is important: does it succeed in building a dynamically linked executable, or does it succeed because the flag is simply ignored?
Indeed, the flag is ignored. Agda isn't linked in dynamically, see: https://github.com/agda/agda/actions/runs/8213327362/job/22464590517?pr=7172
Two more insights:
executable-dynamic: True
, already installing the dependency alex
fails: https://github.com/agda/agda/actions/runs/8213327362/job/22464590326?pr=7172#step:9:258
Error: Could not load module ‘Prelude’.
Perhaps you haven't installed the "dyn" libraries for package ‘base-4.19.1.0’?
Hi @alt-romes,
why does Agda fail when compiled with --enable-executable-dynamic?
It's because:
libHSAgda*.so
;libHSAgda*.so
is under a weird path like /home/ulidtko/.cabal/store/ghc-9.6.4/incoming/new-3043174/home/ulidtko/.cabal/store/ghc-9.6.4/Agda-2.6.5-a88d61781dc99162e02d5d6d176b13f1418efc8b30ce081e297f0204e18346d3/lib/libHSAgda-2.6.5-a88d61781dc99162e02d5d6d176b13f1418efc8b30ce081e297f0204e18346d3-ghc9.6.4.so
error while loading shared libraries
.The usual ways to specify library path are:
/etc/ld.so.conf.d/something.conf
;LD_LIBRARY_PATH
environment variable temporarily;RPATH
into the executable;RPATH
post-hoc.Unfortunately I'm not seeing the ~/.cabal/store/ghc-*/incoming/new-*
subdir — is it possible to ask Cabal to not delete it?..
Also notice, that v1-install
with --enable-executable-dynamic
succeeds on the same package.
I.e. this is only an issue in v2-install
.
@ulidtko Ah, I see.
Ultimately, I don't think this is Cabal's fault, but rather Agda's use of a custom Setup, which will work in some situations but won't be robust to every one (like with executable-dynamic).
I suspect that the reason why it does not work with v2-install
is that the assumptions about the system required for the Custom Setup to be successful in the presence of enable-executable-dynamic
do not hold true for the nix-style v2 Cabal, for reasons inherit to its design (e.g. the fact that the whole package is built in this temporary directory before being installed to the store atomically).
Can Cabal provide perhaps a "post-atomic-store-install" hook, to enable use-cases like Agda's?
Can Cabal provide perhaps a "post-atomic-store-install" hook, to enable use-cases like Agda's?
I don't think the problem would be solved by that. I just meant to say that some assumptions made by the Agda custom Setup may not hold true in nix-style Cabal, and provide examples of things that changed -- I was not saying that that particular thing is the problem.
I would have to investigate.
@alt-romes for your convenience, I've minimized a test case that reproduces the issue.
Find it here https://gist.github.com/ulidtko/fd65f7e5d486744d0f70dbe1934c6894
If anyone gives me a pointer to cabal tests perhaps, I can try to integrate it as a proper test case.
Besides the install itself failing, leave that aside for now @alt-romes — notice from @andreasabel OP
[...] discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.
— I do reproduce this, too (on cabal 3.10.2.1). See gist. --enable-executable-dynamic
has no effect — only executable-dynamic: True
in ~/.cabal/config
does.
I believe just this detail is way beyond expected, and is the actual subject of this bug report.
I will take a look at the reproducer later, thanks.
As for the comment regarding that bit of the OP: I've replied in this thread explaining that I've already fixed that discrepancy in a recent pull request: https://github.com/haskell/cabal/issues/9784#issuecomment-1983638565
(That part is outdated)
I took a look at this.
The cause for the failure is that the installation strategy of v2 cabal is currently incompatible with running dynamically linked executables in the post install hook in not (yet) supporting relocatable builds (perhaps Agda shouldn't run the executable in a post-install step).
I think we may be able to fix in Cabal by making the installation slightly more relocatable (an alternative is for Agda to work around this limitation manually, but it'd be pretty ugly :), it depends on whether the solution gets merged which may be a bit risky, but I think could be worth it -- then again, it may not be wise to do a risky change like this to support a workflow which is not decidely something we want cc @gbaz )
Here's what happens:
Distribution.Client.Store
)/absolute/to/<cabal-store>/<package>/lib
(I don't know exactly who sets the library path). Furthermore the runtime path is extended by cabal with an option to the linker e.g. for your repro we have:
-optl-Wl,-rpath,/root/.local/state/cabal/store/ghc-9.6.4/cabal9784-repro-0.1-9e8629789f420c50edf74ef0eb19562afbe3114bbee5c8ab57c429597f7c5fc9/lib
What can we do to solve this? Basically, set the library path of the package library something relative to the location of the binary.
/<store>/<package>/lib/<lib>.so
, we'd set the, in macOS, the path the the lib as @executable_path/../lib/<lib>.so
(really, a bit more complicated than this because of all libraries being installed in the same dir on macos due to CMD Load limits), and in Linux as $ORIGIN/../lib/<lib>.so
.But as I'm finished writing this I realize it may not be this simple. The problem is the same problem which makes the path to the temporary directory be that monstrous thing with duplicated directory hiearchy:
Nothing in Cabal guarantees that the storedir is the same as the installdir..., which means that defining the path to the dynamic library relative to the binary may not be possible. This may be a more fundamental limitation than what I had originally thought.
A possible alternative on Agda's side:
install_name_tool
on macOS and something else like ldd
on linux).Here's what happens: A package to be installed is built in a temporary directory (that long path you pasted)
This is incorrect right away @alt-romes.
The build happens under yet another temporary directory:
/tmp/cabal-install.-3164886/dist-newstyle/tmp/src-3164886/cabal9784-repro-0.1
.BTW this one looks fine to me, not weird in any way. It also fits into 1 line.
Sorry, I was not precise and mixed building and installing. The building happens in the builddir, but the installation is prepared in temporary directory that is then copied over to the store:
-- The write-side protocol is:
--
-- * Create a unique temp dir and write all store entry files into it.
--
-- * Take a lock named after the 'UnitId' in question.
--
-- * Once holding the lock, check again for the existence of the final store
-- entry directory. If the entry exists then the process lost the race and it
-- must abandon, unlock and re-use the existing store entry. If the entry
-- does not exist then the process won the race and it can proceed.
--
-- * Register the package into the package db. Note that the files are not in
-- their final location at this stage so registration file checks may need
-- to be disabled.
--
-- * Atomically rename the temp dir to the final store entry location.
--
-- * Release the previously-acquired lock.
I'll update my comment.
Actually, your comment reminded me of something else: when you build the executable inplace (in dist-newstyle
), the path to the lib will also be fixed to the <storedir>/<package>/lib/<libname>.so
(because linking, which determines the RPATHs, is done at build-time).
And... so it is... if you run the executable you have built in place (you can find it with cabal list-bin cabal9784-repro
) it will also fail because the dynamic library will not be found (as it has not been installed there...)
I've mixed up some things incorrectly and need to look at this again with a clearer head. This doesn't make a lot of sense.
Nothing in Cabal guarantees that the storedir is the same as the installdir...
I understood this from your analysis in #9654 @alt-romes. But this doesn't follow:
which means that defining the path to the dynamic library relative to the binary may not be possible.
— in the most general case, yes. But nothing prevents us from optimizing the common case, right?
Can cabal detect, that none of custom --prefix
, --libdir
or whatnot was used, and/or that the dependency-lib actually is installed under the store dir? And perhaps based on that, avoid the weird-dir shenanigans.
In defensive programming philosophy, the only guarantees you have — are the ones you've explicitly checked.
Oh dear... I've conflated a few too many things in this conversation (I blame tiredness)...
Cabal explicitly extends the RPATH
(runtime lib path) of the executable with the path to the cabal store, but the path to the package library is set to something else (probably GHC based on something related to the package database that cabal provides it) -- I'm not very familiar with linux dynamic library loading (unlike with macOS ones), so for example I used ldd
to get the dynlib paths and got full paths and was interpreting that as the path to the lib being absolute, but it may be that it is being resolved (or not) in the RPATH? I need to check carefully.
I misjudged and misunderstood a few things while going through this and was a bit hasty. Let me come back tomorrow to this with a clear head.
@alt-romes sure. In advance, ldd
does output resolved paths. Executables internally refer to libraries abstractly, as simply libHSfoobar-<abi_stuff>.so
or libm.so.6
— the exact file paths are the dynamic linker's job to resolve (dyld on OS X, ld.so on Linux) .
These things influence what the dynamic linker's resolution does on Linux:
/etc/ld.so.conf.d/something.conf
(followed by running ldconfig
as root to rebuild caches);LD_LIBRARY_PATH
environment variable temporarily;RPATH
/ RUNPATH
into the executable;RPATH
/ RUNPATH
post-hoc.You can use otool -l
or objdump -x | grep 'R.*PATH'
to print RPATH baked into an executable (resp. on OS X & Linux). It's usually a :
-separated list. The linked stackoverflow thread, as well as Apple's man 1 dyld
, explain variables like @loader_path
in RPATH that dyld interprets.
@ulidtko wrote:
If anyone gives me a pointer to cabal tests perhaps, I can try to integrate it as a proper test case.
E.g. there is an example how to make a test in one of @alt-romes PRs: https://github.com/haskell/cabal/pull/9762/files#diff-6db13a6ade63e9c657436ae3057120fa9b5fbc52fbc3b79e8d7574722f28f7b0
(But make sure you sync with him on this, maybe Rodrigo will add it in course of a fix of this issue.)
I've minimised the reproducer a bit (for Linux):
It suffices to have a cabal package with both a library and an executable, use Custom
build-type with defaultMain
, and build with executable-dynamic: True
.
We don't even need to get to the install step, simply running the built executable in dist-newstyle
(or using cabal run
) is enough to trigger the same error.
I'm convinced this is a Cabal bug (which I can reproduce in master). I'm trying to figure out why this works on macOS.
Why does this not fail on macOS?
Cabal invokes GHC to link the executable, and, on macOS, GHC will add RPATH entries to the executable pointing to libraries used by the executable (see runInjectRPaths
on the GHC source). I think that this Cabal bug not occurring on macOS is only a fortunate coincidence caused by GHC having some special rpath logic on macos.
Why does this not fail if build-type: Simple?
When using build-type: Simple
, cabal extends the executable RPATH with the path to the build directory via the GHC linker invocation:
-optl-Wl,-rpath,/opt/ghc/9.6.4/lib/ghc-9.6.4/lib/aarch64-linux-ghc-9.6.4
-optl-Wl,-rpath,/what/dist-newstyle/build/aarch64-linux/ghc-9.6.4/what-0.1.0.0/build
However, when using build-type: Custom
we don't, instead we pass the path to the cabal store and to ghc's libdir:
-optl-Wl,-rpath,/root/.cabal/lib/aarch64-linux-ghc-9.6.4
-optl-Wl,-rpath,/opt/ghc/9.6.4/lib/ghc-9.6.4/lib/aarch64-linux-ghc-9.6.4
Where to look now?
This narrows it down: I suspect depLibraryPaths
...
It seems that it is expected for an executable which depends on an internal lib built with Custom setup/legacy fallback and executable-dynamic: True
to not include the build directory in the RPATH of the executable, but rather that the executable should only expect to find that internal library when it is installed in the cabal store. See the test added by a1107e21f:
+ -- Test that we don't accidentally add the inplace directory to
+ -- an executable RPATH. Don't test on Windows, which doesn't
+ -- support RPATH.
+ unlessWindows $ do
+ tc "Regression/T4025" $ do
+ cabal "configure" ["--enable-executable-dynamic"]
+ cabal "build" []
+ -- This should fail as it we should NOT be able to find the
+ -- dynamic library for the internal library (since we didn't
+ -- install it). If we incorrectly encoded our local dist
+ -- dir in the RPATH, this will succeed.
+ shouldFail $ runExe' "exe" []
On one hand I understand this point, but on the other hand it makes it impossible for packages with Custom setups to run the built executables in the copy phase, as the dynamic library will only be available where it is expected to be after it is installed in the store.
I don't entirely follow #4033 on the first read. :confused:
@alt-romes speaking of phases, what's the sequence? build, copy, register, install ?
For context, what Agda does: it's a compiler that ships with a bunch of wired-in internal modules, a-la GHC.Prim. For importing, these modules must first be typechecked, producing .agdai
files, similar to .hi
or .pyc
. Now, it's a crucial step to typecheck these wired-in modules during agda build — otherwise, there's possibility of breakage in wired-in modules to slip through, and install broken compiler.
Another reason is efficiency. Also, keeping permission boundaries: when run later by user, the compiler might no longer have permissions to write the .agdai
files where they belong, if installed from a .deb package for example.
That's why that custom setup hook is there. I don't think it's particularly important for Agda in which phase of cabal install it runs. The only expected outcomes are:
.agdai
outputs;.agdai
outputs are added to the package's data-files
.So maybe, it shouldn't be a hook in the copy phase, but some other phase?
I've briefly discussed this with Duncan, and the summary of our discussion is:
cabal run
should successfully run the in-place executable by augmenting LD_LIBRARY_PATH to account for the build directory
cabal install Agda
fails withexecutable-dynamic: True
in config file:What is puzzling that
cabal install Agda --enable-executable-dynamic
(instead of setting in the config file) succeeds: https://github.com/agda/agda/actions/runs/8184356501/job/22378664111?pr=7172Agda fails in the
copyHook
in itsSetup.hs
when it tries to invoke the just-built executable: https://github.com/agda/agda/blob/dee0d2e1300b43ac91517c3a5c58cae6b6af3a3d/Setup.hs#L134-L140 This situation has been discussed elsewhere in this issue tracker:The current issue highlights the discrepancy in behavior between passing the command-line flag and changing the config setting for dynamic linking.
Related:
ghc-options: -dynamic
: https://github.com/haskell/cabal/issues/6505