snowleopard / hadrian

Hadrian: a new build system for the Glasgow Haskell Compiler. Now merged into the GHC tree!
https://gitlab.haskell.org/ghc/ghc/tree/master/hadrian
MIT License
209 stars 37 forks source link

Implement better fix for the OS X / findPtr issue #701

Closed alpmestan closed 6 years ago

alpmestan commented 6 years ago

This is an alternative solution to https://github.com/snowleopard/hadrian/issues/614.

@Mistuke pointed out here that the solution I went with (always include findPtr/_findPtr) has its own problems, and suggested a better one. I implemented just that in https://github.com/alpmestan/ghc/commit/f78e0ae443f48bb2c58819f04d781a3d27c2d528.

Note that the hadrian build should "just" pick up the fix, since we already pass the debug flag whenever appropriate for our extra-library-flavours business. Let's see what the OS X builds say. When I get them to work, I'll submit the GHC patch, and when it's merged, we can just close this PR I suppose, since there's nothing to change in hadrian land. :-)

snowleopard commented 6 years ago

Sounds good @alpmestan, thank you!

alpmestan commented 6 years ago

Hmm... Circle CI fails with:

Command: _build/stage0/bin/ghc -Wall -hisuf hi -osuf o -hcsuf hc -static -hide-all-packages -no-user-package-db '-package-db _build/stage1/lib/package.conf.d' '-this-unit-id ghc-boot-th-8.7' '-package-id base-4.12.0.0' -i -i_build/stage1/libraries/ghc-boot-th/build -i_build/stage1/libraries/ghc-boot-th/build/autogen -ilibraries/ghc-boot-th/. -Iincludes -I_build/generated -I_build/stage1/libraries/ghc-boot-th/build -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181009/base-4.12.0.0/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181009/rts-1.0/include -I_build/generated -optc-I_build/generated -optP-include -optP_build/stage1/libraries/ghc-boot-th/build/autogen/cabal_macros.h -optc-fno-stack-protector -odir _build/stage1/libraries/ghc-boot-th/build -hidir _build/stage1/libraries/ghc-boot-th/build -stubdir _build/stage1/libraries/ghc-boot-th/build -Wnoncanonical-monad-instances -optc-Wno-unknown-pragmas -c libraries/ghc-boot-th/GHC/ForeignSrcLang/Type.hs -o _build/stage1/libraries/ghc-boot-th/build/GHC/ForeignSrcLang/Type.o -O0 -H64m -XHaskell2010 -XNoImplicitPrelude -ghcversion-file=/Users/distiller/hadrian/ghc/_build/generated/ghcversion.h -Wno-deprecated-flags
Exit code: 1
Stderr:
libraries/ghc-boot-th/GHC/ForeignSrcLang/Type.hs:1:1: error:
    Bad interface file: /usr/local/Cellar/ghc/8.4.3/lib/ghc-8.4.3/integer-gmp-1.0.2.0/GHC/Integer/Type.hi
        mismatched interface file versions (wanted "80720181009", got "8043")
  |
1 | {-# LANGUAGE DeriveGeneric #-}
  | ^
)
Mistuke commented 6 years ago

So does AppVeyor which shouldn't have been affected by this patch, looks like the stage0 compiler that stack is installing is broken?

alpmestan commented 6 years ago

Well, Circle CI uses the cabal script I think? And installs ghc from homebrew.

Mistuke commented 6 years ago

Usually when this happens there's some kind of ABI mismatch. I assume the builds aren't cached? The linux build doesn't seem to suffer the same issue. why would also AppVeyor be affected? this only changes the CircleCI script no?

alpmestan commented 6 years ago

Yes, and I don't know why appveyor is affected.

alpmestan commented 6 years ago

When I try to reproduce the problem I get... other failures. It's just one of those days/weeks...

Edit: now I cannot reproduce my own errors anymore.

alpmestan commented 6 years ago

I think I know where the circle ci failure comes from: --integer-simple... I'll remove it temporarily to see how things go. I wonder if https://github.com/ghc/ghc/commit/fc2ff6dd7496a33bf68165b28f37f40b7d647418 has anything to do with this.

alpmestan commented 6 years ago

Haha! New failure:

| Run Ghc CompileHs Stage1: libraries/ghci/GHCi/Message.hs => _build/stage1/libraries/ghci/build/GHCi/Message.o
| Run Ghc LinkHs Stage1: _build/stage1/utils/hpc/build/Main.o (and 11 more) => _build/stage1/bin/hpc
ld: library not found for -lgmp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
`gcc' failed in phase `Linker'. (Exit code: 1)
shakeArgsWith    0.000s    0%                           
Function shake   0.432s    0%                           
Database read    0.002s    0%                           
With database    0.000s    0%                           
Running rules  932.053s   99%  =========================
Total          932.487s  100%                           
Error when running Shake build system:
* _build/stage1/bin/hpc
user error (Development.Shake.cmd, system command failed
Command: _build/stage0/bin/ghc -Wall -hisuf hi -osuf o -hcsuf hc -static -hide-all-packages -no-user-package-db '-package-db _build/stage1/lib/package.conf.d' '-package-id array-0.5.2.0' '-package-id base-4.12.0.0' '-package-id containers-0.6.0.1' '-package-id directory-1.3.3.0' '-package-id filepath-1.4.2.1' '-package-id hpc-0.6.0.3' -i -i_build/stage1/utils/hpc/build -i_build/stage1/utils/hpc/build/hpc/autogen -iutils/hpc/. -Iincludes -I_build/generated -I_build/stage1/utils/hpc/build -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/unix-2.7.2.2/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/time-1.8.0.2/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/bytestring-0.10.8.2/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/base-4.12.0.0/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/integer-gmp-1.0.2.0/include -I/Users/distiller/hadrian/ghc/_build/stage1/lib/x86_64-osx-ghc-8.7.20181015/rts-1.0/include -I_build/generated -optc-I_build/generated -optP-include -optP_build/stage1/utils/hpc/build/hpc/autogen/cabal_macros.h -optc-fno-stack-protector -odir _build/stage1/utils/hpc/build -hidir _build/stage1/utils/hpc/build -stubdir _build/stage1/utils/hpc/build -no-auto-link-packages -rtsopts -optl-lgmp -Wnoncanonical-monad-instances -optc-Wno-unknown-pragmas _build/stage1/utils/hpc/build/Main.o _build/stage1/utils/hpc/build/HpcParser.o _build/stage1/utils/hpc/build/HpcCombine.o _build/stage1/utils/hpc/build/HpcDraft.o _build/stage1/utils/hpc/build/HpcFlags.o _build/stage1/utils/hpc/build/HpcLexer.o _build/stage1/utils/hpc/build/HpcMarkup.o _build/stage1/utils/hpc/build/HpcOverlay.o _build/stage1/utils/hpc/build/HpcReport.o _build/stage1/utils/hpc/build/HpcShowTix.o _build/stage1/utils/hpc/build/HpcUtils.o _build/stage1/utils/hpc/build/Paths_hpc_bin.o -o _build/stage1/bin/hpc -O0 -H64m -XHaskell2010 -ghcversion-file=/Users/distiller/hadrian/ghc/_build/generated/ghcversion.h
Exit code: 1
Stderr:
ld: library not found for -lgmp
clang: error: linker command failed with exit code 1 (use -v to see invocation)
`gcc' failed in phase `Linker'. (Exit code: 1)
)

@snowleopard Could we arrange for libgmp to be available? At least to make CI work, then I can investigate the --integer-simple problem.

snowleopard commented 6 years ago

@alpmestan Thank you for investigating this further! I'm back to Newcastle and hope to make some progress on Hadrian this week.

Could we arrange for libgmp to be available?

But Hadrian is supposed to build it if it's not available. So, it looks like the build is currently broken.

Note that we use --integer-simple on some CI instances to make sure this mode is not broken.

alpmestan commented 6 years ago

If I take the tip of ghc's master branch of today, revert https://github.com/ghc/ghc/commit/fc2ff6dd7496a33bf68165b28f37f40b7d647418 and build with --integer-simple, it works.

alpmestan commented 6 years ago

So... maybe

src/Settings/Builders/Ghc.hs:116:            , libraryPackage ? arg ("-this-unit-id " ++ pkgId)

should be updated to special-case integer-gmp/integer-simple? But shouldn't the ghc-options changes from both cabal files in the GHC commit I linked to take care of that?

snowleopard commented 6 years ago

But shouldn't the ghc-options changes from both cabal files in the GHC commit I linked to take care of that?

Yes, that would be my expectation. Since there are no changes to the Make build system, I suppose there shouldn't be any changes in Hadrian either... Oh wait, there is one change in compiler/ghc.mk. Do you think it's relevant?

alpmestan commented 6 years ago

I have no idea. My nightmare of a week (last week, where I investigated why Cabal won't install my additional dynamic RTS flavours, to no avail) managed to convince me I don't know anything, it's all just an illusion and some occasional luck. =)

More seriously, the change in question removes 2 lines from the Config.hs file generated by the make build system. Searching for what got removed, cIntegerLibrary, in that commit's diff, doesn't yield anything. I suspect this is not the problem (might be wrong though, cf last week).

The commit message says:

This patch adds a new DynFlag that configures which integer library to use. This flag is initialized by cIntegerLibraryType (as before), and is only used in CorePrep to decide whether to use S# or not.

The other code paths that were varying based on cIntegerLibraryType are no now longer varying: The trick is to use integer-wired-in as the -this-unit-id when compiling either integer-gmp or integer-simple.

And given the ghc-options we mentionned earlier, which should arrange for hadrian to add the right -this-unit-id to the command line, I think we're good on that front. The only reason for this to still fail that I see is that the Settings.Builders.Ghc snipped I mentionned above also adds a -this-unit-id, but a different one, and I suppose GHC just picks it up instead of integer-wired-in.

snowleopard commented 6 years ago

The only reason for this to still fail that I see is that the Settings.Builders.Ghc snipped I mentionned above also adds a -this-unit-id, but a different one, and I suppose GHC just picks it up instead of integer-wired-in.

@alpmestan I think you may be right. Could you try adding a special case to use integer-wired-in instead of pkgId?

alpmestan commented 6 years ago

I checked and indeed GHC receives two -this-unit-id args when building integer-simple modules and registering the package. One with just the typical package id, and the one from the ghc-options field of the .cabal file. So I made this patch:

diff --git a/src/Settings/Builders/Ghc.hs b/src/Settings/Builders/Ghc.hs
index 8212b5f..082e6c8 100644
--- a/src/Settings/Builders/Ghc.hs
+++ b/src/Settings/Builders/Ghc.hs
@@ -113,7 +113,12 @@ packageGhcArgs = do
     mconcat [ arg "-hide-all-packages"
             , arg "-no-user-package-db"
             , packageDatabaseArgs
-            , libraryPackage ? arg ("-this-unit-id " ++ pkgId)
+            , libraryPackage ?
+                -- those two packages specify 'integer-wired-in' as the unit id
+                -- in their .cabal files (ghc-options field), so we need to omit
+                -- this argument just for them.
+                not (package `elem` [integerSimple, integerGmp]) ?
+                  arg ("-this-unit-id " ++ pkgId)
             , map ("-package-id " ++) <$> getContextData depIds ]

 includeGhcArgs :: Args

but it apparently isn't enough, I've rebuilt everything from scratch but still see this error:

$ hadrian/build.sh -j4 --flavour=quickest -o_quickest --integer-simple
[...]
Error when running Shake build system:
* _quickest3/stage1/lib/package.conf.d/ghc-boot-th-8.7.conf
* _quickest3/stage1/libraries/ghc-boot-th/build/libHSghc-boot-th-8.7.a
* _quickest3/stage1/libraries/ghc-boot-th/build/GHC/Lexeme.o
* _quickest3/stage1/libraries/ghc-boot-th/build/GHC/Lexeme.o _quickest3/stage1/libraries/ghc-boot-th/build/GHC/Lexeme.hi
user error (Development.Shake.cmd, system command failed
Command: _quickest3/stage0/bin/ghc -Wall -hisuf hi -osuf o -hcsuf hc -static -hide-all-packages -no-user-package-db '-package-db _quickest3/stage1/lib/package.conf.d' '-this-unit-id ghc-boot-th-8.7' '-package-id base-4.12.0.0' -i -i_quickest3/stage1/libraries/ghc-boot-th/build -i_quickest3/stage1/libraries/ghc-boot-th/build/autogen -ilibraries/ghc-boot-th/. -Iincludes -I_quickest3/generated -I_quickest3/stage1/libraries/ghc-boot-th/build -I/nix/store/0d2ajw9gi57cc1iv2hzfi07d54zfrfq8-ghc-build-environment/include -I/home/alp/WT/ghc-diffs/_quickest3/stage1/lib/x86_64-linux-ghc-8.7.20181015/base-4.12.0.0/include -I/home/alp/WT/ghc-diffs/_quickest3/stage1/lib/x86_64-linux-ghc-8.7.20181015/rts-1.0/include -I_quickest3/generated -optc-I_quickest3/generated -optP-include -optP_quickest3/stage1/libraries/ghc-boot-th/build/autogen/cabal_macros.h -optc-fno-stack-protector -odir _quickest3/stage1/libraries/ghc-boot-th/build -hidir _quickest3/stage1/libraries/ghc-boot-th/build -stubdir _quickest3/stage1/libraries/ghc-boot-th/build -Wnoncanonical-monad-instances -optc-Werror=unused-but-set-variable -optc-Wno-error=inline -c libraries/ghc-boot-th/GHC/Lexeme.hs -o _quickest3/stage1/libraries/ghc-boot-th/build/GHC/Lexeme.o -O0 -H64m -XHaskell2010 -XNoImplicitPrelude -ghcversion-file=/home/alp/WT/ghc-diffs/_quickest3/generated/ghcversion.h -Wno-deprecated-flags
Exit code: 1
Stderr:
ghc: panic! (the 'impossible' happened)
  (GHC version 8.7.20181015 for x86_64-unknown-linux):
    lookupGlobal
  Bad interface file: /nix/store/g5ggsbqkwqiiq3qkpw8ji42axrna117y-ghc-8.4.3/lib/ghc-8.4.3/integer-gmp-1.0.2.0/GHC/Integer/Type.hi
      mismatched interface file versions (wanted "80720181015", got "8043")
  Call stack:
      CallStack (from HasCallStack):
        callStackDoc, called at compiler/utils/Outputable.hs:1160:37 in ghc:Outputable
        pprPanic, called at compiler/typecheck/TcEnv.hs:132:32 in ghc:TcEnv

Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

)
snowleopard commented 6 years ago

mismatched interface file versions (wanted "80720181015", got "8043")

Looks like we somehow manage to mix up interface files built in different stages -- I have no idea how...

alpmestan commented 6 years ago

Well, keep in mind that reverting https://github.com/ghc/ghc/commit/fc2ff6dd7496a33bf68165b28f37f40b7d647418 is sufficient for making everything work again. So we're probably not too far from a fix, I'm just failing to see what else we should change right now.

alpmestan commented 6 years ago

I figured it out. Will create an issue explaining the problem and the potential solutions.

Mistuke commented 6 years ago

Ping.

snowleopard commented 6 years ago

@alpmestan I lost track of this -- have you submitted the findPtr patch you mentioned? In any case, we should probably close off all leftover PRs here...

alpmestan commented 6 years ago

@Mistuke Oh right, the other fix got merged only recently. I suppose I now have to create a differential for https://github.com/alpmestan/ghc/commit/4c37000bd65befcfd8331ed19dd46eb478ecf683 -- and... that's it, I think. Hadrian already passes the right flag.

alpmestan commented 6 years ago

@Mistuke Could you try https://phabricator.haskell.org/D5356? I made you a reviewer.

Mistuke commented 6 years ago

@alpmestan thanks for making the diff! the patch will run through validation tonight with my other stuff, I'll post a reply tomorrow. Cheers.

alpmestan commented 6 years ago

@Mistuke reported that the Windows tests got fixed with the patch, woohoo!

snowleopard commented 6 years ago

Awesome, many thanks 🎉