haskell / haskell-language-server

Official haskell ide support via language server (LSP). Successor of ghcide & haskell-ide-engine.
Apache License 2.0
2.71k stars 368 forks source link

HLS and the user's GHC #3671

Open michaelpj opened 1 year ago

michaelpj commented 1 year ago

This issue is an attempt to write down as much of the picture around how HLS uses GHC, and especially how HLS interacts with the user's GHC.

Preliminaries

Constraints

We have some big constraints on what we can do:

  1. HLS must process the user's code in the same way that the user's compiler would
  2. GHC interface files are not guaranteed to be compatible across even minor versions of GHC
  3. Object files are not guaranteed to be loadable across even minor versions of GHC or even different builds of the same version of GHC
    • The factors that affect compatibility are combined into the "ABI hash"
    • GHC (always?) checks this and will refuse to proceed if it isn't what it expects
    • If you don't check it you get segfaults or worse
  4. HLS has a very "rich" interface to GHC, e.g. we expect to be able to work with, in Haskell, the parsed AST, the typechecked AST, the module summary (including e.g. exports), ...

Big picture

The basic picture of a HLS session is:

Various approaches

This section lists various possible approaches. These clearly have some similarities but I'm just going to list them for now.

Use the ghc API

Decide on the ghc API at build time

This is what we do today. Constraint 3 means that we therefore need one HLS binary per built compiler, which is quite painful for distribution.

Even if we could lift constraint 3, we would need one binary per minor version because of constraint 2. If we lifted constraint 2 as well, we might be able to have one binary per major version, which would be a significant improvement.

Also, since ghc cannot be reinstalled, this locks us into our choices of build compiler also. This is not a necessary state of affairs, contrast with ghc-lib-parser, where you can build ghc-lib-parser-9.6 with GHC 9.2. There is no necessary reason why a HLS supporting GHC X needs to be built with GHC X.

Use multiple copies of the ghc API

A wild idea would be to move the duplication from multiple binaries into a single binary. If ghc was reinstallable and cabal allowed multiple instances of the same library in a build plan, then we could have multiple libraries hls-9.0 (depending on ghc-9.0), hls-9.2(depending on ghc-9.2) etc. Each of the hls-* libraries could have the same source files with different CPP options set. We could then bundle all of the libraries together in a single binary, with the choice of library being determined at runtime depending on the version of the user's compiler.

This would essentially merge all of our current binaries (including the wrapper) into one, which would be simpler.

This would probably also only work if we manage to lift constraint 3, and it would be best if we could lift constraint 2 as well (otherwise we'd need hls-9.0.0, hls-9.0.1... etc.).

Use the user's compiler

We have all these problems about compatibility with the user's compiler, why not actually use it ourselves?

Over a protocol

We could imagine moving all the functionality that HLS needs into GHC, and talking to it over some protocol. This has a few problems:

  1. The protocol would have to be tremendously complex, since it needs to cover the many complex types and operations that HLS needs (constraint 4)
  2. Serialization costs would likely be significant
  3. HLS would need to be able to talk the protocol version that matches the user's GHC version. However, this should in principle be no more complex than supporting multiple versions of the GHC API, which we do today
  4. We'd need the protocol to be stable across minor versions

However, this would solve many of the compatibility issues. We could build HLS with whatever compiler we wanted, so long as we had client libraries for all the versions of the protocol we wanted to support.

By linking the ghc library from the user's compiler

I'm pretty sure this can't work: we'd have ABI compatibility issues between the local library and the HLS binary. It's the same problem that we already had with taking locally build libraries and loading them into HLS.

Never use the user's compiler

The other way of avoiding incompatibility with the user's compiler is not to use it. HLS has (or could have?) a working ghc, so we could tell the build tool to use our GHC instead of the user's one. Problems:

  1. Probably more rebuilding, since we wouldn't be able to reuse any dependencies from the cabal store.
  2. As a side effect we'll add built packages to the cabal store - cabal had better be able to detect the kind of ABI incompatibilities we're worried about and not try to link those packages with other packages that they're incompatible with.

What about constraint 1? Well, we can probably get away with processing the user's code using a different minor version of the same compiler that they are using. So if we shipped one HLS for each major version, always built with the latest minor version of that major version, then the user-visible behaviour should be pretty much identical.

So this would let us ship only one HLS per major version and not have to worry so much about ABI compatibility.

Related issues

michaelpj commented 1 year ago

The "never use the user's compiler" approach is a new one to me. Have we considered it before? It seems potentially good. But I don't know what we'd need to do to beef up HLS's ghc library into something that could actually stand in for a full GHC installation.

joyfulmantis commented 1 year ago

I'm liking the never use the user's ghc compiler option -- no more cpp!. I guess the problem is when there's a breaking change between ghc versions and a package doesn't build with the latest ghc, then hls is also unable to build it, which would not be great.

michaelpj commented 1 year ago

Right, so we're assuming that there are no serious behaviour changes between minor versions, which I think is fairly safe. If it's a breaking change it should be in a major version!

hasufell commented 1 year ago

Right, so we're assuming that there are no serious behaviour changes between minor versions, which I think is fairly safe.

I personally don't trust this assessment.

joyfulmantis commented 1 year ago

Right, so we're assuming that there are no serious behaviour changes between minor versions, which I think is fairly safe. If it's a breaking change it should be in a major version!

Right, and even though we can't rule out the occasional minor breaking change, the benefit (not having to ship separate minor versions) probably outweighs the negatives