martijnbastiaan commented 2 years ago

Background

GHC environment files (from here on: GEFs) describe where GHC package databases reside and what packages from those databases should be exposed in a particular GHC session. As an example, here's what a GEF looks like after compiling aeson-pretty:

clear-package-db
global-package-db
package-db /home/martijn/cabal/store/ghc-8.10.3/package.db
package-db dist-newstyle/packagedb/ghc-8.10.3
package-id aeson-pretty-0.8.8-inplace
package-id aeson-2.0.1.0-87763de9410f519d0c483501155e5ff011c56e8c7ad8eac58f4d552d4171a333
...

Unless explicitly surpressed, any GHC session executed in either the directory containing the GEF or any directory under that will use this file to find and expose packages. Cabal used to generate these files automatically, however, this has been disabled as many users found it confusing. Users can now opt-in by passing --write-ghc-environment-files=always or by adding a similar section to their cabal.project.

I purposely won't go into the perceived downsides of GEFs, as I think this has been discussed many times over on various fora.

So what's wrong?

While seemingly opt-in, GEFs are currently the only way Cabal communicates what package databases it used and what packages are exposed to subcommands. This leads to some projects (doctest-parallel, clash-ghc + projects build with Clash) requiring their presence. In general, any downstream consumer of the GHC API is likely to want access to these files. There's currently a couple of ways these projects solve this, but without delving into them: this gets us right back to square one. That is, for these projects, all the downsides of GEFs are exposed once more.

Brushing aside projects needing GEFs, even users who liked their behavior are now in a rough spot: they need to remember to either enable them globally, per project, or even on a per command basis.

Consumers

As far as I can see, there are two groups of GEF users:

Those that need GEFs for their application to work properly. They're only interested in executables that access the GHC API in some way. These executables typically run as a Cabal subcommand (e.g., through cabal run).
Those that want GEFs for perceived benefits. I'm not a member of this group, but I believe people like compiling their project once, and then having access to modules in any GHCi / GHC session run within their project.

Then, there's a (I believe, large) group that doesn't care at all but is prone to being bitten by unexpected behavior imposed by GEFs. This group overlaps with group (1).

Tackling this

I strongly believe a few things should happen:

Cabal should never generate a GEF at the root of a project. Instead, it should generate it inside dist-newstyle.
Cabal should set GHC_ENVIRONMENT and set it to the GEF mentioned in (1) when running subcommands. E.g., for run and test.
Cabal should offer a subcommand shell that drops you into a shell with GHC_ENVIRONMENT set. Like build, one can set a target, or none at all. This command should be able to be run anywhere in your project. It should optionally take a command to run, e.g. cabal shell my-exe -- ghci.

Notes

In no particular order:

There's already cabal exec that generates a temporary GEF and sets GHC_ENVIRONMENT. However, exec doesn't actually build anything.
It seems weird to me that Cabal writes GEFs to the root of a project, while --distdir also exist.
Stack already uses a combination of GHC_PACKAGE_PATH and GHC_ENVIRONMENT for subcommands. This works really well for the projects I work on.
I've lost a lot of time to GEFs either due to confusion on my own side, interns, or colleagues.
As noted, I work on Clash and doctest-parallel, both of which heavily rely on using the GHC API. I understand this is not typical.
I believe the proposal could satisfy all stakeholders in https://github.com/haskell/cabal/issues/4542. I'd love to hear different opinions.
If people agree with this RFC, I'm willing to write a more detailed proposal if necessary. I'm also willing to do implementation work, as long as I get some pointer on where to look.
GHC should probably be patched to not automatically find .ghc.* files - this is IMO where most of the frustration stems from.
I'm assuming v2-* commands in this RFC.

phadej commented 2 years ago

While seemingly opt-in, GEFs are currently the only way Cabal communicates what package databases it used and what packages are exposed to subcommands.

provide enough information to recover build setup (you need to guess where the compiler executable is though). E.g. https://github.com/phadej/cabal-extras/tree/master/cabal-docspec uses these and doesn't rely on environment files at all.

martijnbastiaan commented 2 years ago

Thanks for the links! IMO these solutions are much more involved from a user perspective than Cabal setting an environment variable. The alternative, the way Stack handles it and what's proposed here, needs no specific support from GHC API consumers at all. And at the risk of sounding very negative, as a developer I'm not particularly thrilled about relying on external packages parsing Cabal internals and having two separate code paths for Stack and Cabal.

This is in no way trying to diminish the work that went into cabal-plan and cabal-install-parsers, I've used both and am very happy they exist.

(As a funny note: I did look into cabal-install-parsers for doctest-parallel but using it broke our CI in the weirdest way possible: https://gitlab.com/clash-lang/clash-compiler/-/jobs/1827067901. I never really got to the bottom of that. Presumably it forced an outdated dependency..?)

phadej commented 2 years ago

Let see.

I guess cabal-install could always write an environment file next to plan.json. It's a subset of the same information (+ location of package database), but in different format.

That feature shouldn't affect whether environment files are written in the root of the project: their use case is different.

Setting up in GHC_ENVIRONMENT in cabal run (and cabal test) is one more implicit thing: bad. It should be possible to turn it off. In fact, I think it should be opt-in, i.e. off by default.

martijnbastiaan commented 2 years ago

Thanks for the quick reply.

I guess cabal-install could always write an environment file next to plan.json. It's a subset of the same information (+ location of package database), but in different format.

Cool

That feature shouldn't affect whether environment files are written in the root of the project: their use case is different.

Alright, I don't mind too much as long as the default is off.

Setting up in GHC_ENVIRONMENT in cabal run (and cabal test) is one more implicit thing: bad. It should be possible to turn it off. In fact, I think it should be opt-in, i.e. off by default.

Generally I agree: explicit is better than implicit. But it doesn't seem too implicit to me that an executable run with cabal run would run with the env. Stack has been doing that for years now, and I don't believe anyone has ever been bitten by it. Having said that, it wouldn't be a hassle to turn it on. Ideally it would be configurable per executable, something like:

test-suite doctests
[..]
   with-ghc-environment: true

What do you think?

gbaz commented 2 years ago

I agree with oleg that a simple baseline of always writing env files next to plan.json is a good idea -- I think this had been suggested as part of an earlier cleanup and got lost in the shuffle.

You note that cabal exec is almost cabal shell but doesn't build anything. I'm curious what the use case is for building things as part of it is, or why a cabal build followed by a cabal exec wouldn't suffice in that case?

And I also am not sure about what the specific use case is for cabal run (and test) having the env variable set if the file is known to be available in a fixed place. Shouldn't the former suffice to help make all necessary tools aware of it? Or would they all now work if they were first dropped into a cabal exec?

Because any change to implicit stuff can always cause user frustration and confusion (as we learned the hard way a few times) I'd very much like to understand the exact use cases here.

martijnbastiaan commented 2 years ago

Writing env files to dist-newstyle is a good first step.

You note that cabal exec is almost cabal shell but doesn't build anything. I'm curious what the use case is for building things as part of it is, or why a cabal build followed by a cabal exec wouldn't suffice in that case?

It would. cabal exec takes whatever was built, creates an environment file for it, sets GHC_ENVIRONMENT and runs an executable with it. The reason this is unsatisfactory in my opinion is that the successful execution of a command now relies on a manual build step. If forgotten, a user is faced with all kinds of error messages. Similar to how one can just do cabal run some-exe without running cabal build some-exe before it, I'd like my applications to work with one command invocation too.

But yes, cabal shell $target would roughly be equivalent to cabal build $target && cabal exec bash.

And I also am not sure about what the specific use case is for cabal run (and test) having the env variable set if the file is known to be available in a fixed place. Shouldn't the former suffice to help make all necessary tools aware of it?

Perhaps. A technical problem I see is that dist-newstyle isn't a static directory, so you need to somehow figure out what --distdir was during execution. Alternatively, you could try and find out where the running executable is located and traverse up from there in hopes of finding the env file. But more importantly, this assumes a very Cabal centrist look in the first place. The two projects I mentioned can be used with Cabal, Stack, and Nix. I don't have to write any workarounds for the latter two, and I was hoping Cabal could prepare the environments executables run in in a similar way.

Because any change to implicit stuff can always cause user frustration and confusion (as we learned the hard way a few times) I'd very much like to understand the exact use cases here.

Very fair. For what it's worth, I really struggle to imagine an executable that would want to access the GHC API without having access to the packages Cabal just built. Also, given that Nix and Stack have both been doing this for years now, I'm inclined to say it isn't really a problem. Lastly, if applications would want to access the GHC API without having access to built libraries, it would be as simple as erasing GHC_ENVIRONMENT; which is much less effort than what it takes to properly integrate with Cabal right now.

martijnbastiaan commented 2 years ago

I'm curious what the use case is for building things as part of it is

To give a very specific answer to this, I'm currently working on two projects: doctest-parallel and clash-{prelude,lib,ghc}. The first one uses the GHC API (indirectly, by spawning ghci) to execute tests specified in documentation strings. In order to properly work, ghci needs access to all the dependencies and libraries of the project. The second project is a compiler that compiles Haskell's core into synthesizable hardware designs (i.e., hardware descriptions you could build silicon out of). In order to do this, Clash needs to load core representations / unfoldings using the GHC API which again relies on access to all the packages built by Cabal (or Stack, or Nix).

As @phadej correctly remarked, all this information is available in the dist-newstyle directory. In theory Clash and doctest-parallel could use this information to extract build information and pass it down to the GHC API / ghci. However, this would require both projects to build workarounds digging into Cabal internals (while Stack and Nix both work "out of the box"), which leaves a sour taste in my mouth.

I currently work around this by recommending users to put write-ghc-environment-files: always in their cabal.project or "simply" use Stack. I don't like the former because it reintroduces all the unexpected behavior of env files, and I don't like the latter because I don't like pushing users to any specific technology.

phadej commented 2 years ago

Very fair. For what it's worth, I really struggle to imagine an executable that would want to access the GHC API without having access to the packages Cabal just built.

https://github.com/haskell/cabal/issues/7792

martijnbastiaan commented 2 years ago

Thanks! Cabal itself is an interesting edge case, given that it manages its own dependencies.

In any case, would the following be (easily) implementable?

test-suite doctests
[..]
   with-ghc-environment: true

phadej commented 2 years ago

In any case, would the following be (easily) implementable?

test-suite doctests
[..]
   with-ghc-environment: true

It won't work as you expect. It's the Cabal (the library) which interprets .cabal files, builds and runs tests at the end. Also in stack and nix: the ./Setup

It is possible to make ./Setup write environment files somewhere and set GHC_ENVIRONMENT variable. (EDIT:: essentially just dumping what is passed to it, package-dbs and ids). It might conflict with what stack does already atm. (test-suite "environment" is smaller then whole project). I think you'd really need the environment variable then, as the ./Setup's build directory is build tool dependent, cabal-install's builddir for component is something like dist-newstyle/build/x86_64-linux/ghc-8.6.5/cabal-testsuite-3/build/cabal-tests/ for example.

That will require new cabal-version .

Using old-and-proven doctest might just work then, as it GHC_ENVIRONMENT variable will tell it what it needs, you'd only need to specify global flags and where to find sources in test-suite driver. (I.e. completely? obsolete cabal-doctest custom setup hack). That's a good thing.

I'd say it's implementable, but it's not easy.

EDIT: note that at the moment Cabal the library doesn't know about environment files. (and/or GHC_ENVIRONMENT variable, thus #7792 e.g).

EDIT2: but it might be that some build tool doesn't actually call ./Setup test, but rather runs the test-suite executables directly. Then with-ghc-environment won't work. E.g. the cabal run test-suite-name won't, so there should be a way to specify some GHC_ENVIRONMENT for cabal run. That sounds bad or at least inconvenient.

martijnbastiaan commented 2 years ago

Hmm, I'm not familiar enough with Cabal's flow to properly understand what you're saying, but I think I get the gist.

EDIT: note that at the moment Cabal the library doesn't know about environment files. (and/or GHC_ENVIRONMENT variable, thus #7792 e.g).

Who's the one running the executables, is that cabal-install? In that case I'd propose to add the flag to the cabal binary and cabal.project. I.e., you could run --run-with-ghc-environment=always or run-with-ghc-environment: always respectively. If I'm understanding you correctly this wouldn't interfere with Cabal-the-library. This flag would practically be the same as --write-ghc-environment-files=always, except it writes it to dist-newstyle and sets GHC_ENVIRONMENT.

As far as the issue goes: it seems natural to me to unset / ignore GHC_ENVIRONMENT for Cabal. Cabal is after all managing its own databases.

EDIT2: but it might be that some build tool doesn't actually call ./Setup test, but rather runs the test-suite executables directly. Then with-ghc-environment won't work. E.g. the cabal run test-suite-name won't, so there should be a way to specify some GHC_ENVIRONMENT for cabal run. That sounds bad or at least inconvenient.

Yes, this is a problem for Clash and doctest-parallel alike. I don't mind too much though, we just ask all our users to use cabal run. Both are developer tools anyway, and tightly coupled to the codebase one's working on. There's no real need to install them as standalone binaries. (For Clash we do ship standalone binaries btw, but this is after using cabal-debian or Nix - both of which provide a static environment.)

martijnbastiaan commented 2 years ago

Also, I'd like to thank everyone in this thread for their contribution, it's much appreciated :)

phadej commented 2 years ago

Who's the one running the executables,

if it's cabal test, then the underlying ./Setup

Which makes setting GHC_ENVIRONMENT in cabal-install tricky (impossible?) as ./Setup should unset it to avoid implicit interactions!

phadej commented 2 years ago

TL;DR, cabal install just calls to ./Setup. stack calls to ./Setup, nix is calling ./Setup. Tools just setup the environment to run ./Setup successfully (i.e. manage dependencies).

gbaz commented 2 years ago

Ok, so I think there's three parts here.

1) Always write an env file to dist-newstyle 2) Add a flag (which is then settable in cabal-project) to run with the env variable 3) cabal shell THING as an alias for cabal build THING && cabal exec bash

I'm not super keen on the last still, since i hate adding new commands.

Could we instead add a flag to cabal-exec to tell it to build the deps for a given component before execing?

martijnbastiaan commented 2 years ago

Who's the one running the executables,

if it's cabal test, then the underlying ./Setup

Which makes setting GHC_ENVIRONMENT in cabal-install tricky (impossible?) as ./Setup should unset it to avoid implicit interactions!

Hmm, perhaps I'm misunderstanding, but a naive read of CmdRun.hs / Run.hs / CmdTest.hs makes me believe that while building everything might use Setup.hs, actually running it does not:

End of runAction:

https://github.com/haskell/cabal/blob/bbc11f1c71651e910976c16498bc4871d7b416ea/cabal-install/src/Distribution/Client/CmdRun.hs#L225-L233

Snippet of runProgramInvocation:

https://github.com/haskell/cabal/blob/bbc11f1c71651e910976c16498bc4871d7b416ea/Cabal/src/Distribution/Simple/Program/Run.hs#L127-L130

I'm only interested in this specific invocation running with a GHC_ENVIRONMENT set, which I believe cabal-install is able to provide at that point (or else I'm really unsure how --write-ghc-environment-files=always works!)

martijnbastiaan commented 2 years ago

@gbaz I understand the hate for adding new commands. I kinda tunnel-visioned myself on moving the env file, while writing it twice would perfectly suffice. I don't actually care about shell myself :-)

phadej commented 2 years ago

@martijnbastiaan don't confuse run and test. tests and benchmarks are "run" during the "build" process. https://github.com/haskell/cabal/blob/980a532875efc2ddbab0761d4feb95fd247866a6/cabal-install/src/Distribution/Client/ProjectBuilding.hs#L1331-L1337, using setup...

martijnbastiaan commented 2 years ago

ah oof..

haskell / cabal

RFC: Move GHC environment files into `dist-newstyle` and add `cabal shell` #7890

Background

So what's wrong?

Consumers

Tackling this

Notes