haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.63k stars 696 forks source link

"malformed mach-o: load commands size" with a multi-package cabal.project #5220

Open phadej opened 6 years ago

phadej commented 6 years ago

We have a project with over 50 packages we run into infamous OSX linker problem.

In otool -l dump of some dylib there are

Load command 21
          cmd LC_LOAD_DYLIB
      cmdsize 80
         name @rpath/libHSdpndnt-mp-0.2.4.0-b3458e24-ghc8.2.2.dylib (offset 24)
   time stamp 2 Thu Jan  1 02:00:02 1970
      current version 0.0.0

for each dependency, which is fine.

Local libraries names aren't mungled

Load command 45
          cmd LC_LOAD_DYLIB
      cmdsize 72
         name @rpath/libHSenv-config-0-inplace-ghc8.2.2.dylib (offset 24)
   time stamp 2 Thu Jan  1 02:00:02 1970
      current version 0.0.0
compatibility version 0.0.0

local package names aren't mungled: we have "long" names futurice-this and futurice-that, servant-algebraic-graphs ...

Each local library's dylib is in own dir

While there is one LC_RPATH for store:

Load command 330
          cmd LC_RPATH
      cmdsize 56
         path /Users/toku/.cabal/store/ghc-8.2.2/lib (offset 12)

There are one per local dependency:

Load command 320
          cmd LC_RPATH
      cmdsize 96
         path /Users/toku/hmr/dist-newstyle/build/x86_64-osx/ghc-8.2.2/log-cloudwatch-0/build (offset 12)
Load command 321
          cmd LC_RPATH
      cmdsize 88
         path /Users/toku/hmr/dist-newstyle/build/x86_64-osx/ghc-8.2.2/periocron-0/build (offset 12)
...

I'd say that fixing the second part, having single libdir in dist-newstyle (on OSX?) would save this problem too. For now it prevents splitting big "industrial size" repository into smaller packages (we want to have separate packages, as it's easier to manage).

We could use internal libraries to workaround this, but the problem would still persist for cases like https://github.com/phadej/acme-kmett (I didn't tried to compile it on OSX).

otool-hyperloglog.txt

The above dump is for hyperloglog .dylib from acme-kmett, it has 111 commands, small-dep app in our repo has 332, biggest (which overflows 32k limit) has 420.

cc @christiaanb

angerman commented 6 years ago

Lovely... so that's the boundary where the library-name munging hits its limits. So acme-kmett should be a good test case? I'll try building that. Let's see what happens.

phadej commented 6 years ago

@angerman acme-kmett doesn't fail, it's just a repo with enough "deps" to highlight the problem.

Maybe if you pull all deps of yesod locally into single cabal.project, it will actually fail. I don't have a osx machine myself, so cannot try that.

angerman commented 6 years ago

Just learned so myself. Alright, I’ll try yesod.

I vaguely remember having the everything in the same lib idea implemented somewhere. But it would break package relocatability. I’m however more inclined to see if we can just link direct dependencies only in ghc and circumvent the limit that way; might still need to symlink(?) all libs into a common folder in cabal.

I’ll give

cartazio commented 6 years ago

so the new-build version of the linker sadness of OSX has finally happened?

cartazio commented 6 years ago

what about having a lib symlinks directory in the new-dist folder for new builds, then we can get really short relative paths, but they should be safe in the face of relocations

lets ignore the issue with lack of symlinks on window for this :))))

angerman commented 6 years ago

Alright, i got a reproduction case:

$ brew install stack
$ stack new test-project yesod-sqlite
$ cd test-project && cabal new-build # this won't fail, but will populate the package database appropriately.
$ ghc-pkg --package-db ~/.cabal/store/ghc-8.4.1/package.db dot|tred > graph.dot
$ awk -F\  '{ print $1 }' graph.dot|grep \" >> pkgs
$ awk -F\  '{ print $3 }' graph.dot|grep \"|sed s/\"\;/\"/g >> pkgs

now pkgs will contain all packages from the package database.

$ echo "packages: ." > cabal.project
$ cat pkgs|sort -u|awk -F\" '{ print "          "$2 }' >> cabal.project

will create the cabal.project file containing all the libs. This might need a tiny bit of touch up...

And now...

$ cat cabal.project|grep -v "^packages"|xargs cabal unpack
$ cabal new-build

will likely error out with:

ghc: panic! (the 'impossible' happened)
  (GHC version 8.4.1 for x86_64-apple-darwin):
    Loading temp shared object failed: dlopen(/var/folders/fv/xqjrpfj516n5xq_m_ljpsjx00000gn/T/ghc53894_0/libghc_19.dylib, 5): no suitable image found.  Did find:
    /var/folders/fv/xqjrpfj516n5xq_m_ljpsjx00000gn/T/ghc53894_0/libghc_19.dylib: malformed mach-o: load commands size (34104) > 32768

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
angerman commented 6 years ago

Reproduction repo: https://github.com/angerman/macos-ghc-dylib-blowup

angerman commented 6 years ago

So I see the #4656 fixed the libraries in $HOME/.cabal/store only. Not in the the local dist-newbuild. That also explains why you need to specifically have them all in your project to trigger the issue.

jship commented 6 years ago

At work, we have worked around the mach-o "load commands size" issue by following in the Nix folks' footsteps and using a wrapper script around ld to recursively subdivide the dependencies into a tree of re-exporting delegate libraries.

The script, an example, and more info about this issue is available here: https://github.com/Simspace/ld-wrapper-macos

23Skidoo commented 6 years ago

@jship's suggestion is probably the best way forward. PRs welcome.

angerman commented 6 years ago

Note that this is fixed in ghc8.6+ (https://phabricator.haskell.org/D4714) and backported in nixpkgs for at least 8.4 and 8.2.

This can also be fixed in the tooling I believe, as mafia did here: https://github.com/haskell-mafia/mafia/pull/226

angerman commented 6 years ago

Also note that once you start hitting the dylib issues you will soon hit the realgcc.exe: CreateProcess: No such file or directory when building on windows. Which is technically a gcc bug. See https://phabricator.haskell.org/D4762 for a rather crude hack around it.

Notably once you hit that on windows, you might hit it eventually on macOS and likely quite a bit later on Linux as well as your projects transitive dependency closure grows.

23Skidoo commented 6 years ago

Mafia's fix looks quite simple, anyone willing to implement it in Cabal to fix the issue for GHC < 8.6?

dfithian commented 3 years ago

We have run into this issue using Cabal 3.4.0.0, GHC 8.10.4 malformed mach-o: load commands size (34376) > 32768 at work on our packages with the largest number of dependencies. We have https://github.com/Simspace/ld-wrapper-macos to workaround it in the short term. We are currently switching from Stack to Cabal. Stack does not have this issue.

I saw this comment in this PR: https://github.com/haskell/cabal/pull/7094#issuecomment-716739600

We are wondering if this issue is still being tracked or planned to be worked on in GHC 9 and above.

Mikolaj commented 3 years ago

Which issue?

7339 is definitely being worked on.

dfithian commented 3 years ago

Wondering if #5220 (this one in the current release of Cabal/GHC) is known about or being worked on.

I'm not familiar enough with rpath to know if it would help, but it seems like path length is a factor in this bug, so that's why I asked.

Mikolaj commented 3 years ago

I'm not familiar with the story, but reading the comments I see that #5220 is already closed for GHC >= 8.6. I don't think it's being worked on any more and I actually think we should close it. Am I misreading the comments?

alt-romes commented 1 year ago

I was unable to reproduce this using @angerman's methodology in pandoc. Can someone still reproduce this reliably ? @dfithian