--integer-simple is broken

alpmestan commented 5 years ago

... and I figured out why while investigating the CI failures from #701. It's not something there's one natural fix for that stands out. We're at a crossroads and we need to pick a solution. Everyone's free to give some input, but it'd be great to in particular hear from @bgamari @snowleopard and @angerman here.

Here's the problem statement. When we build with --integer-simple, stage 1 GHC gets built fine (under _build/stage0/bin/ghc, since it is built by the boot -- stage 0 -- compiler). The libraries that were needed for this got either built or copied from the boot compiler's package db and reside under build/stage0/lib. Then the stage 1 GHC starts building the few packages it needs for building the stage 2 compiler, and while it's doing this, we get errors about mismatched interface files that have to do with Integer and in particular GHC seems to be reaching for integer-gmp interface files. Stage 1 GHC has built integer-simple at that point (under _build/stage1/lib) and should therefore be using it, but instead it gets confused and goes for integer-gmp. This only happens with a GHC three that has this commit.

So what's going on? Everything Integer belong to a fictional package integer-wired-in which gets resolved to some version of integer-gmp or integer-simple depending on the circumstances. Our stage 1 GHC lives under _build/stage0/bin/ghc, and with the current heuristics, the global ("system") package db is determined to be under $topdir/package.conf.d. By default, $topdir is assumed to be at ../lib compared to where the executable lives. So the global package db for our stage 1 ghc is therefore taken to be _build/stage0/lib/package.conf.d, even though those are things built by the boot compiler! You can see how this is going to be problematic...

In initPackages, we end up looking at two package databases, the global one mentionned above, because that's something we always look at by default, unless told not to (-no-global-package-db), as well as the one given on the command line (the right one, _build/stage1/lib/package.conf.d). Here's their contents:

- /home/alp/WT/ghc-diffs/_quickest/stage0/lib/package.conf.d
  - ghc-8.7
  - Cabal-2.4.0.1
  - parsec-3.1.13.0
  - text-1.2.3.1
  - process-1.6.3.0
  - ghci-8.7
  - template-haskell-2.14.0.0
  - ghc-boot-8.7
  - binary-0.8.6.0
  - pretty-1.1.3.6
  - mtl-2.2.2
  - transformers-0.5.5.0
  - ghc-heap-8.7
  - hpc-0.6.0.3
  - terminfo-0.4.1.2
  - ghc-boot-th-8.7
  - directory-1.3.1.5
  - containers-0.5.11.0
  - unix-2.7.2.2
  - bytestring-0.10.8.2
  - time-1.8.0.2
  - deepseq-1.4.3.0
  - array-0.5.2.0
  - filepath-1.4.2
  - base-4.11.1.0
  - integer-gmp-1.0.2.0
  - ghc-prim-0.5.2.0
  - rts
- _quickest/stage1/lib/package.conf.d
  - base-4.12.0.0
  - integer-simple-0.1.1.1
  - ghc-prim-0.5.3
  - rts-1.0

And in findWiredInPackages, when comes the time to decide what integer-wired-in should resolve to for everything that this ghc session will do, we're faced with a choice between stage0/lib's integer-gmp-1.0.2.0 and stage1/lib's integer-simple-0.1.1.1. As you can probably guess from the error, GHC ends up picking integer-gmp even though it was built with the boot compiler and not stage 1, and later down the road, whenever we touch something that refers to any entity from integer-wired-in, boom.

Now you may wonder, why on earth do we pick integer-gmp? Well, the "comparison" is done using a function named compareByPreference, whose code is below:

compareByPreference
    :: PackagePrecedenceIndex
    -> PackageConfig
    -> PackageConfig
    -> Ordering
compareByPreference prec_map pkg pkg' =
    case comparing packageVersion pkg pkg' of
        GT -> GT
        EQ | Just prec  <- Map.lookup (unitId pkg)  prec_map
           , Just prec' <- Map.lookup (unitId pkg') prec_map
           -- Prefer the package from the later DB flag (i.e., higher
           -- precedence)
           -> compare prec prec'
           | otherwise
           -> EQ
        LT -> LT

The precedence index thingie is just a map that gives (in our case) higher precedence to packages from _build/stage1/lib/package.conf.d (because it's given on the command line) then to those from _build/stage0/lib/package.conf.d (which is the global one for our stage 1 GHC). So in that precedence index, integer-gmp is given a precedence of 1 and integer-simple has 2, which should in theory make it "win". But look at the code... It compares package versions before anything. It assumes the package names are identical, I guess. And it turns out, integer-gmp version is strictly greater than integer-simple's... so it gets picked. I can get the entire build to succeed by simply changing compareByPreference to the following implementation:

compareByPreference
    :: PackagePrecedenceIndex
    -> PackageConfig
    -> PackageConfig
    -> Ordering
compareByPreference prec_map pkg pkg'
  | packageName pkg == packageName pkg' -- this is the existing logic
  = case comparing packageVersion pkg pkg' of
        GT -> GT
        EQ | Just prec  <- Map.lookup (unitId pkg)  prec_map
           , Just prec' <- Map.lookup (unitId pkg') prec_map
           -- Prefer the package from the later DB flag (i.e., higher
           -- precedence)
           -> compare prec prec'
           | otherwise
           -> EQ
        LT -> LT
  | otherwise -- different package names! let's only rely on precedence here
  = let mprec  = Map.lookup (unitId pkg)  prec_map
        mprec' = Map.lookup (unitId pkg') prec_map
    in case (mprec, mprec') of
         (Just prec, Just prec') -> compare prec prec'
         _                       -> error "I don't know what to do here"

Now, you can see this is not ideal. But it still makes more sense than the existing implementation, which completely ignores the possibility that the packages might be different. However, this situation shouldn't happen to begin with, our stage 1 ghc's global package database should be _build/stage/lib/package.conf.d. So we've got two problems:

compareByPrecedence makes an assumption that bad situations can break, whereas my definition can be a lot more helpful in those cases, especially if the situation is bad enough to make it land in the error branch. And alternative implementation would be to error out if packageName pkg /= packageName pkg' instead of looking at precedences. Regardless, there's an improvement to be made there I think. (1)
_build/stage<N>/bin/ghc's package database lives at _buid/stage<N+1>/lib/package.conf.d but GHC doesn't know about such layouts so _build/stage<N>/bin/ghc ends up thinking its package db is _build/stage<N>/lib/package.conf.d. We override the-package-db` whenever we invoke ghc in hadrian, but that's not going to be the case of people running the ghc executables manually. (2)

For (1), a GHC patch that implements either of my suggestions is enough to fix it. It is however more of a "improve behaviour/problem reporting under buggy circumstances" patch, and is not the core of our problem. The real problem is (2). Note that the build failure can also be addressed by specifying -no-global-package-db in every _build/stage<N>/bin/ghc command, that'd make integer-gmp (and the whole stage0 package db) disappear from the listing above, and there would be no choice left to make, we'd just go for integer-simple. However, we don't want to force our users to do this when using our ghc executables!

What are our options? A few came up in my mind or in discussions with others. But here's the one I would likely suggest we go for.

Let's move _build/stage<N>/bin/ghc to _build/stage<N+1>/bin/ghc. That way, it would sit right next to the libraries that it builds (as opposed to next to the other things that were built by the same compiler that built _build/stage<N>/bin/ghc, like now), the $topdir would correctly resolve to _build/stage<N+1>/lib, the default/global package would correctly resolve to _build/stage<N+1>/lib/package.conf.d and everything would go well. We would have:

_build
├──  _stage0     # libraries/etc built by the boot compiler in order to build the stage 1 compiler
│       ├── lib      # no bin, the stage 0 toolchain is already there before we build GHC
├──  _stage1
│       ├── bin     # the entire stage1 "ghc toolsuite" (ghc, ghc-pkg, ...)
│       ├── lib      # package db with all the libraries built by _stage1/bin/ghc
├──  _stage2
│       ├── bin     # the entire stage2 "ghc toolsuite"
│       ├── lib      # package db with all the libraries built by _stage2/bin/ghc

The alternatives all worse, from what I can tell. We could require the generation and use of wrapper scripts that point to the right topdir with -B. We do want to support wrappers and have some code for that, but requiring them when we can avoid it feels a bit silly. We could teach GHC about those stage<N>/stage<N+1> dirs, but that would be a gross hack. All in all, I think the aforementionned split would be best.

Thoughts on both problems and the solutions that I suggest?

snowleopard commented 5 years ago

@alpmestan Thank you for the amazing investigation!

I don't understand why simply adding -no-global-package-db to all invocations of GHC by Hadrian is not a good solution. You say:

However, we don't want to force our users to do this when using our ghc executables!

But they won't need to! We will install GHC and the package library in completely separate directory, outside the _build tree, where there will be no way confuse different stages.

I would really prefer to keep everything built in Stage0 in the _build/stage0 directory. It feels the Right Thing to do, because it's easy to explain. Introducing the complexity into the build directory structure just as a workaround for what I believe is a plain bug in compareByPreference sounds wrong.

snowleopard commented 5 years ago

P.S.: I hope we can all agree that the current behaviour of GHC where it "prefers" a completely different package is a bug. Let's not capitulate to this bug by complicating Hadrian! :)

alpmestan commented 5 years ago

The Real Bug (TM) is that the global package db paths that hadrian-produced GHCs compute are wrong.

-no-global-package-db just lets us pretend that package db doesn't exist, so it slips the problem under the carpet, very much literally. Fortunately, we also specify the right -package-db on every invokation, but any command that doesn't have that is also dommed to potentially fail if arbitrary precision integers are involved one way or another... And I do run stage 1 or 2 GHC from the build tree regularly, while investigating bugs typically, so I'd definitely like having the ability to run my own little commands without having to prefix them with some magical package db incantation.

The most important thing to do here is to go back to computing correct global package dbs. And that problem is triggered by the current _build layout. They should be right, from the start, and we possibly might want to change paths when we install GHC but just doing _build/stage0/bin/ghc -e '1 :: Integer' ought to work with --integer-simple GHCs, without the need for a wrapper script or to install things, I think.

To achieve correct computation of global package db, the most logical solution is I think the one I suggested above. Finally, the compareByPreference "bug" is merely a side show here, if we fix (2) nobody ever has to deal with (1), while if we just fix (1) then anyone not doing the magic package db dance on the command line will not be able to build code that involves integer-wired-in, if my understanding is correct.

Reply to your P.S.: I don't think that would be a big hadrian patch, actually. Also with a diagram like the one I made above (with more details) we can probably explain this very easily. Or better yet, an actual good looking picture.

snowleopard commented 5 years ago

The Real Bug (TM) is that the global package db paths that hadrian-produced GHCs compute are wrong.

Could you clarify how "Hadrian-produced GHC" differs from "Make-produced GHC"?

I think the package database path-computation logic is the same in both, isn't it? So, if you say this is the Real Bug, shouldn't we fix it?

My understanding is that you say that the path-computation logic is wrong in both Hadrian-built and Make-built GHCs, and the only pragmatically viable solution is to just use the same directory structure as Make, even at the cost of extra complexity, because it happens to work well the current path-computation logic.

Maybe you are right, maybe a better path-computation logic is just too complex itself and this is just a trade-off between the complexity in GHC and the complexity in the build system.

I don't think that would be a big hadrian patch, actually.

Sure, I believe it. It's easy to switch a more complex directory structure in Hadrian. But will it be easy to work with this more complex structure in years to come? This is the only thing that bothers me. I was very confused by how build artefacts were organised in Make when I started working on Hadrian, and I wanted to move towards a more logical and clean organisation. It would be unfortunate if some of this progress will have to be undone.

snowleopard commented 5 years ago

@alpmestan By the way, I may be wrong in thinking that the current directory structure is simpler. Perhaps, what you suggest is actually what GHC developers expect! In this case, I'm happy to go with your solution.

I remember that @angerman, for example, was also arguing for keeping Stage1 libraries in the same directory as Stage0 binaries... or something like this.

alpmestan commented 5 years ago

Sorry, I forgot to add a paragraph about why make-produced GHCs don't suffer from this problem. The topdir-resolution logic being tailored to the make-produced GHCs, there will never suffer from anything like this since the inplace/ layout is exactly how the GHC code expects bins/libs to be laid out. Stage 1 and 2 GHCs always manage to find the right topdir and package db. Here's the inplace layout:

$ tree -L 3 ~/WT/ghc-hadrian/inplace/
/home/alp/WT/ghc-hadrian/inplace/
├── bin
│   ├── check-api-annotations
│   ├── check-ppr
│   ├── count_lines
│   ├── deriveConstants
│   ├── genapply
│   ├── genprimopcode
│   ├── ghc-cabal
│   ├── ghc-pkg
│   ├── ghc-stage1
│   ├── ghc-stage2
│   ├── ghctags
│   ├── hp2ps
│   ├── hpc
│   ├── hsc2hs
│   ├── mkdirhier
│   └── runghc
├── lib
│   ├── bin
│   │   ├── check-api-annotations
│   │   ├── check-ppr
│   │   ├── deriveConstants
│   │   ├── genapply
│   │   ├── genprimopcode
│   │   ├── ghc-iserv
│   │   ├── ghc-iserv.bin
│   │   ├── ghc-iserv-dyn
│   │   ├── ghc-iserv-dyn.bin
│   │   ├── ghc-iserv-prof
│   │   ├── ghc-iserv-prof.bin
│   │   ├── ghc-pkg
│   │   ├── ghc-split
│   │   ├── ghc-stage1
│   │   ├── ghc-stage2
│   │   ├── ghctags
│   │   ├── hp2ps
│   │   ├── hpc
│   │   ├── hsc2hs
│   │   ├── runghc
│   │   ├── unlit
│   │   └── unlit.bin
│   ├── ghci-usage.txt
│   ├── ghc-usage.txt
│   ├── llvm-passes
│   ├── llvm-targets
│   ├── package.conf.d
│   │   ├── array-0.5.2.0.conf
│   │   ├── base-4.12.0.0.conf
│   │   ├── binary-0.8.6.0.conf
│   │   ├── bytestring-0.10.8.2.conf
│   │   ├── Cabal-2.4.0.1.conf
│   │   ├── containers-0.6.0.1.conf
│   │   ├── deepseq-1.4.4.0.conf
│   │   ├── directory-1.3.3.0.conf
│   │   ├── filepath-1.4.2.1.conf
│   │   ├── ghc-8.7.conf
│   │   ├── ghc-boot-8.7.conf
│   │   ├── ghc-boot-th-8.7.conf
│   │   ├── ghc-compact-0.1.0.0.conf
│   │   ├── ghc-heap-8.7.conf
│   │   ├── ghci-8.7.conf
│   │   ├── ghc-prim-0.5.3.conf
│   │   ├── haskeline-0.7.4.3.conf
│   │   ├── hpc-0.6.0.3.conf
│   │   ├── integer-simple-0.1.1.1.conf
│   │   ├── libiserv-8.7.1.conf
│   │   ├── mtl-2.2.2.conf
│   │   ├── package.cache
│   │   ├── package.cache.lock
│   │   ├── parsec-3.1.13.0.conf
│   │   ├── pretty-1.1.3.6.conf
│   │   ├── process-1.6.3.0.conf
│   │   ├── rts.conf
│   │   ├── stm-2.5.0.0.conf
│   │   ├── template-haskell-2.14.0.0.conf
│   │   ├── terminfo-0.4.1.2.conf
│   │   ├── text-1.2.3.1.conf
│   │   ├── time-1.8.0.2.conf
│   │   ├── transformers-0.5.5.0.conf
│   │   └── unix-2.7.2.2.conf
│   ├── platformConstants
│   ├── settings
│   └── template-hsc.h
...

Notice how there's no stage separation and how 1/ all the actual binaries are under inplace/lib/bin 2/ they all have wrapper scripts under inplace/bin (that specifically use -B to point to the lib dir for $topdir). From there, GHC's code takes over and just looks at $topdir/package.conf.d to get the (global) package db.

Whereas in our case, our ghc executables would have to "cross over the stage separation", and we also don't generate wrappers right away yet, you have to ask for a binary distribution, and they're not quite as featureful I think.

All in all, yes, I really recommend the structure I described above. GHC developers will all be surprised regardless, by the stage separation, but I think it'd be great to keep it! Only we would have to change the semantics slightly, to match my "diagram" from earlier. I could try and whip up a slightly nicer diagram (or a picture) that describes it all nicely and add it to the PR that changes this, if we decide that this solution is the one we want. It'll definitely be less surprising for most people IMO that there's no "cross stage" relationships, especially if that happens to make everything work as expected again.

bgamari commented 5 years ago

One trouble with the Make build system's layout is that it's unclear how we could do a stage3 build with its own libraries (e.g. using stage2 to build libaries and then building a stage3 compiler linked against them). That being said, this may not be a super important use-case.

However, I agree with @alpmestan that the current scheme is very surprising. In particular, having to reach into the stage0 directory to find the stage 1 compiler is quite mind-bending.

snowleopard commented 5 years ago

@alpmestan @bgamari OK, let's go ahead with the restructuring.

Is there a simple meaning to what we will then store in _build/stageN? I guess the simplest is "StageN compiler, plus the libraries it built... except for Stage0, where we only have libraries" :) I find this mind-bending instead.

alpmestan commented 5 years ago

Well, it makes sense in my head, because we don't build a stage 0 compiler, that's the boot compiler that we start with. While we do build stage 1 & 2 compilers.

angerman commented 5 years ago

Yes. I had initially a +1 logic on binaries.

To have stage0/bin binaries, (those would be the bootstrap binaries, we could theoretically just symlink them there, or have shell wrapper they just call the respective binary hoping to find it in PATH) to have a uniform interface.

The only thing you need to make sure is that we tar/zip up the right lib/bin pair.

snowleopard commented 5 years ago

@angerman Thanks! Indeed, this is a possible way to make it more uniform. For now I think this is not really necessary, let's just go ahead with @alpmestan's plan.

@alpmestan Are you going to implement the change? I'm happy to do this too if you like.

alpmestan commented 5 years ago

Well, I barely got started but will resume on monday unless you take over before that.

snowleopard commented 5 years ago

I have some time over the weekend, and feel a bit guilty for not doing much in Hadrian recently :-) This seems like a good high-priority task I could do. I'll give it a try and will let you have a look.

alpmestan commented 5 years ago

Sounds great! Open-source is about sharing the fun anyway, so please be my guest and give this a shot. now that you've got some free time again. I would actually be happy to start looking into another problem.

snowleopard commented 5 years ago

One unclear point: what do we do with unlit and touchy? Right now we build them in lib/bin directory, although we planned to move them to bin, see #570.

I guess I'll move them "up" one stage too, but in principle we could perhaps also move them to bin as part of this restructuring.

alpmestan commented 5 years ago

Hmm... I'd probably recommend leaving them under lib/bin/ for now, and then perhaps discussing with the larger GHC devs audience whether we want those to live under the usual binary dir? Or at the very least separating this in two PRs.

snowleopard commented 5 years ago

@alpmestan OK, agreed.

I think I have a working patch -- will create a PR after my Windows build completes.

snowleopard commented 5 years ago

OK, I've created #704. There is one problem though: with the new structure, Stage2 does not work since there is no package database in _build/stage2/lib:

ghc.exe: can't find a package database at C:\msys\home\nam83\ghc\_build\stage2\lib\package.conf.d

Any thoughts?

alpmestan commented 5 years ago

Hmmm. I suspect we might need to tweak:

registerPackage rs context@Context {..} = when (stage < Stage2) $ do
  ...

from src/Rules/Register.hs (removing the stage restriction), and wherever this ends up being needed, maybe?

snowleopard commented 5 years ago

@alpmestan I've created an empty package database, i.e. the directory _build\stage2\lib\package.conf.d. This allowed me to finish the build, but I still can't run Stage2 GHC, now getting the following errors:

<interactive>:1:6: error:
    Not in scope: `System.IO.hSetBuffering'
    No module named `System.IO' is imported.
<interactive>:1:30: error:
    Not in scope: `System.IO.stdin'
    No module named `System.IO' is imported.
...

See https://ci.appveyor.com/project/snowleopard/hadrian/builds/19666764.

Also, one interesting side-effect of the changes is that Haddock and ghc-tags are now built in the directory _build/stage3/bin. I guess that's correct, just looks a bit odd.

alpmestan commented 5 years ago

Hmm, this is all very odd. Let me know if I can help you figure things out.

snowleopard commented 5 years ago

I think it's worth copying @angerman's comment from the PR here:

Let ghc0 be our bootstrap compiler; and libs0 the libraries shipped with the bootstrap compiler. bootstap0 are the libraries we built with ghc0 to augment libs0 as we deem them to be potentially too old. Further let ghcN, libsN be the respective binaries and libraries built in Stage N, by the Stage (N-1) binaries and libraries.

The fundamental reasoning was the following:

ghc0 + libs0 + bootstrap0 => libs1, ghc1.
ghc1 + libs1 => libs2, ghc2.

For cross compilers building ghc2 or infect any other tool for stage 2 makes no sense, as they would be cross compiled compilers, not cross compilers. That is, for ghc1 it will run on a different host than the one it most likely targets. ghc2 build by the same logic would however run on the target and as such not be a cross compiler anymore.

The, we do need libs2 as those are the libraries against which we will link with ghc1; but we do not need (or even want) ghc2 but instead want ghc1 in it's place.

Does that make sense?

NB: we might want to support the case where you build a cross compiled GHC as well; that is instead of copying ghc1 into stage2, we would build ghc2.

snowleopard commented 5 years ago

And below is my response:

@angerman Thanks!

The fundamental reasoning was the following:

ghc0 + libs0 + bootstrap0 => libs1, ghc1.

ghc1 + libs1 => libs2, ghc2.

I'm trying to translate your notation to what should happen in Hadrian. Is the following correct?

ghc0 is what we call Stage0 GHC in Hadrian. It comes with libs0.
ghc1 is Stage1 GHC, which should be _build/stage1/bin/ghc after this PR.
ghc2 is Stage2 GHC, which should be _build/stage2/bin/ghc after this PR.
bootstrap0 live in _build/stage0/lib both before and after this PR.
libs1 live in _build/stage1/lib both before and after this PR.
libs2 live in _build/stage2/lib both before and after this PR.

Is this correct?

If yes, I'm confused about libs2: right now we do not build anything in _build/stage2/lib. Do we really need to build it?

snowleopard commented 5 years ago

...and @angerman's response:

@snowleopard yes. that sounds conceptually correct.

I do think we need the stage2 libs (at least for cross compilers; I'm not sure we need stage1 libs?). If we draw up the dependency graph, by putting the dependencies into parens we'd have:

ghc1(ghc0+libs0+bootstrap0), this one is host -> target, where usually host == target.
libs1(ghc0+libs0+bootstrap0), this one is built for the host (by the ghc1) compiler.
ghc2(ghc1+libs1), this one is target -> target
libs2(ghc1+libs1), this one is built for the target.

For a cross compiler, we need libs2 and ghc1, as we want to build software for the target and thus need the target libs to link; we do decidedly not need ghc2, as we don't want to run the compiler on the target.

However, maybe I'm messing this up right now in my head and we never actually build libs1? Then _build/stage1/lib would be libs2.

I feel I'm confused.

snowleopard commented 5 years ago

@alpmestan @angerman @bgamari It appears there is some confusion about Stage2 libraries, which are supposed to go into the directory _build/stage2/lib both right now and after the proposed restructuring.

I think we never built Stage2 libraries before. Make doesn't build Stage2 libraries by default, and Hadrian never built them either.

Here is my analysis of the situation:

Q1: How does Hadrian manage to build Stage2 packages such as Haddock then? A1: Because we point it to the Stage1 package database via the command line flags.

Q2: Stage2 GHC does not work when executred from the command line after my PR #704. Why? A2: Because there are no Stage2 libraries, and GHC's path-finding logic cannot figure out that it should look for these libraries in the ../../stage1/lib folder.

Q3: How did Stage2 GHC work previously? A3: It worked because previuosly it was located in _build/stage1/bin. Therefore the built-in path-finding algorithm could locate Stage1 libraries in the nearby ../lib directory.

Q4: How does Make-built Stage2 GHC work? A4: In Make, all GHC executables live in the same directory inplace, where we also build the Stage1 library. Both Stage1 and Stage2 GHCs therefore use exactly the same library.

So, if we want to proceed with the restructuring, it appears that we have two options: (1) teach GHC to find Stage1 libraries in the ../../stage1/lib folder; or (2) actually build Stage2 libraries where GHC expects them to be (_build/stage2/lib), which would significantly increase build times.

Neither of the two options looks appealing.

angerman commented 5 years ago

After giving this some more thought, I think we might actually never really build libs2. Instead, we have: libs1 is actually libs0+bootstrap0, and libs2 is actually what I called libs1, though built by ghc1.

Thus:

ghc1(ghc0+libs0+bootstrap0)
libs1(ghc1)
ghc2(ghc1+libs1)

And we then ship ghc2+libs1 for the regular stage2 compiler and ghc1+libs1 for the cross compiler.

snowleopard commented 5 years ago

@angerman Yes, this is my understanding too. Yet, I still don't know how to proceed with this issue.

One option would be to simply copy _build/stage1/lib to _build/stage2/lib. It's a bit wierd, but will probably work. In future, this step could be made conditional: when building Stage3 compiler we would actually populate _build/stage2/lib by recompiling all libraries again using the Stage2 compiler.

alpmestan commented 5 years ago

Hmm. This is a tough situation. Without your PR, the lib path resolution logic hits a wall. But with your PR too. In the first case, well this whole integer-simple business showed us that our stage 1 compiler is a bit messed up, specifically because of the recent integer-wired-in commit and our layout. In the second case, it's stage 2 that's bitten by package db issues.

This indeed either calls for a better lib path resolution logic or a more appropriate layout. We can do anything we want there, as long as it matches with how GHC works. It seems the current separation isn't working all that well. Stage 1 and stage 2 libs should be the same when we're building a stage 2 GHC, but not when building stage 3. The bootstrap compiler comes with its own libs and builds some more. The binaries are more easily classified into stages and the main issue with them is actually only one with ghc and perhaps ghc-pkg: either the stage 1 (without your PR) or stage 2 (with your PR) GHC get the default/global package db path wrong.

Is there a layout we haven't considered that would get us unstuck, one that accounts for all those asymetries?

angerman commented 5 years ago

@alpmestan do you have a link for the integer-wired-in commit?

So this brings us back to what I alluded to on IRC, that this might be a special case for the stage1 compiler.

So we could hardcode the package path in hadrian, but that would mean that anyone calling GHC form the source tree would need to pass the same flags as hadrian, am I right?

alpmestan commented 5 years ago

@angerman https://github.com/ghc/ghc/commit/fc2ff6dd7496a33bf68165b28f37f40b7d647418

snowleopard commented 5 years ago

I think the PR #704 makes things worse, because:

We still have issues with path-resolution logic, arguably more serious ones: while the integer-simple problem seems to be just a thorny edge-case, a completely broken Stage2 GHC is not easy to fix without some dubious hacks. Also, the Stage3 GHC story becomes more complicated.
In my (relatively isolated so far) opinion, the layout becomes more complicated too. I still prefer the simple rule when everything build in StageN is located in directory _build/stageN.

To elaborate on the above Stage3 GHC remark: with the current layout, building a Stage3 GHC is straighforward -- just run another build stage and everything will fall into right places.

I think #704 just highlighted the hidden complexities of the proposed new layout.

alpmestan commented 5 years ago

What solution (to this issue, #702) would you like hadrian to go for?

snowleopard commented 5 years ago

I think I'd just fix GHC's compareByPreference so that it doesn't "resolve" integer-simple to integer-gmp.

This should solve this particular issue.

After that, we could create two new tickets:

You mentioned that you'd like to be able to run Stage1 GHC from command line without pointing it to the right package database with a flag. I don't know how to address this yet, but I think it's not as high-priority as fixing integer-simple.
There are concerns that the current directory layout may be surprising in the sense that Stage1 GHC lives in the directory _build/stage0. We can form a community opinion about this, and plan how to implement a potential restructuring in a separate ticket. The PR #704 will serve as a useful starting point.

@alpmestan What do you think?

alpmestan commented 5 years ago

You mentioned that you'd like to be able to run Stage1 GHC from command line without pointing it to the right package database with a flag. I don't know how to address this yet, but I think it's not as high-priority as fixing integer-simple.

The usual trick is to use wrapper scripts for that I think, but I really like the idea of being able to use the executables right there directly and reserve the wrapper scripts for the situations that really, really call for them. They bring their own complexity.

Now, regarding the suggestion to fix compareByPreference, I certainly get behind that, as I think the current implementation just leaves too many doors open. Regarding the layout etc, I don't really have a great idea here, staged compilation just keeps messing with my mind.

snowleopard commented 5 years ago

The usual trick is to use wrapper scripts

Actually, I think it may indeed be quite convenient to create something like top-level ghc-stage1.sh and ghc-stage2.sh scripts which invoke corresponding binaries from the _build directory with the right flags.

This would isolate GHC developers from changes in build layout and associated issues with flags etc.

Now, regarding the suggestion to fix compareByPreference, I certainly get behind that, as I think the current implementation just leaves too many doors open.

Great, then I guess we can close my PR for now.

angerman commented 5 years ago

Actually, I think it may indeed be quite convenient to create something like top-level ghc-stage1.sh and ghc-stage2.sh scripts which invoke corresponding binaries from the _build directory with the right flags.

I believe those would need to live in _build; we still have a configurable root right?

alpmestan commented 5 years ago

I submitted the compareByPreference patch on phabricator: https://phabricator.haskell.org/D5266

With this, hadrian/build.sh -j4 --flavour=quickest --integer-simple doesn't error out anymore.

snowleopard commented 5 years ago

I believe those would need to live in _build; we still have a configurable root right?

@angerman Yes, the build root is configurable, so these scripts should probably live in the build directory (and should therefore be built by Hadrian).

@alpmestan Awesome, thank you! I've closed #704. Please also close this issue when your GHC patch lands.

alpmestan commented 5 years ago

I can finally say it: fixed by ghc/ghc@86ee74dc999db6030874ddaf5d10adec23108c02 :-)

snowleopard commented 5 years ago

@alpmestan Many thanks :tada:

snowleopard / hadrian

--integer-simple is broken #702