tweag / rules_haskell

Haskell rules for Bazel.
https://haskell.build
Apache License 2.0
266 stars 81 forks source link

Consider separating out Template Haskell dependencies #878

Open mboes opened 5 years ago

mboes commented 5 years ago

Is your feature request related to a problem? Please describe.

Assume we have two libraries A and B such that B depends on A. If it weren't for Template Haskell, we could start building object files in B even before the linking for A is done. But we cannot, because Template Haskell wants to be able to load A if B splices functions defined in A. This puts linking in the critical path, when it need not be.

Worse, any source level change will trigger a rebuild downstream, even if the interface files did not change, because the object files will have changed and those are a dependency for downstream.

Describe the solution you'd like

Add a new attribute called something like enable_template_haskell or a "template-haskell" features string. This would be turned off by default. The user would have to enable Template Haskell explictly, at the cost of longer compile times and more cache invalidation.

Describe alternatives you've considered

Ideally we would list Template Haskell dependencies separately, in a new attribute called compile_only_deps or similar. If the list is empty, then we know that Template Haskell is not needed. The problem is that the completeness of such an attribute cannot be enforced by the compiler, since GHC loads all dependencies conservatively when evaluating a TH splice.

Additional context

In principle compile-time dependencies really should be kept separate. The arguments for doing so are expounded here.

guibou commented 5 years ago

If it weren't for Template Haskell, we could start building object files in B even before the linking for A is done.

Is this possible with bazel? (I'm just curious here).

mboes commented 5 years ago

Rules are not the atomic unit of the build graph. Actions are. After the analysis phase, Bazel doesn't see rules. It just sees actions. If two actions are independent of each other, they can be run in parallel.

mboes commented 5 years ago

Ideally we would list Template Haskell dependencies separately, [...]. The problem is that the completeness of such an attribute cannot be enforced by the compiler

If we go ahead with the strategy exposed in https://github.com/tweag/rules_haskell/issues/873#issuecomment-491494783, we could solve this problem. Essentially, introduce a new rule called template_haskell_library. Then make it so that only those libraries really make it into the extra-ghci-libraries field, or even replace the .so file of non TH libs with a dummy file that exports no symbols. If the user mistakenly omitted a dependency, then the GHCi linker will complain with a not entirely unreasonable error message:

ghc: ^^ Could not load 'pkg_B_x_closure', dependency unresolved. See top entry above.

ByteCodeLink.lookupCE
During interactive linking, GHCi couldn't find the following symbol:
  pkg_B_x_closure

Together with https://github.com/tweag/rules_haskell/issues/873#issuecomment-491494783, this would solve both problems exposed in this ticket's description:

Using dummy library files in lieu of the real ones is similar to the ijar trick Bazel implements in the Java rules. See https://docs.bazel.build/versions/master/skylark/lib/JavaInfo.html#compile_jars.

mboes commented 5 years ago

Essentially, introduce a new rule called template_haskell_library. Then make it so that only those libraries really make it into the extra-ghci-libraries field, or even replace the .so file of non TH libs with a dummy file that exports no symbols.

After some sleep, I realize this is crazy. Point is, when using Template Haskell, anything can be a compile-time (or "syntax") dependency - that depends on the use site, not the definition site. So we need something like:

haskell_library(
    name = "foo",
    deps = [":bar", ":baz"],
    plugins = [":a-plugin"],
    templates = [":qux", ":quux"],
)

Anything in deps will invalidate the cache a lot less frequently than templates.

aherrmann commented 5 years ago

How does GHC handle template Haskell dependencies? Will it just load all hs-libraries and extra-(ghci-)libraries entries of all dependencies? If so, then we may need to change how we generate package configuration files to implement this. E.g. in a way similar to what was discussed here: https://github.com/tweag/rules_haskell/issues/873#issuecomment-491494783.

mboes commented 5 years ago

Right. That was the comment that was linked to in the previous comment in this thread. AFAIU Template Haskell loads everything.

aherrmann commented 2 years ago

@facundominguez IIUC https://github.com/tweag/rules_haskell/pull/1663 provides this in the context of haskell_module, correct?

facundominguez commented 2 years ago

As far as I can see, the discussion here centers on avoiding linking dependencies ahead of compiling libraries or modules that don't use TH.

The haskell_module implementation, instead, avoids linking dependencies even when the modules use TH. This is achieved by having the build actions depend on object files, rather than on shared or static libraries.

Something similar could be implemented when not using haskell_module, I think.

Here's a ghc ticket explaining how haskell_module depends on object files.

aherrmann commented 2 years ago

As far as I can see, the discussion here centers on avoiding linking dependencies ahead of compiling libraries or modules that don't use TH.

The haskell_module implementation, instead, avoids linking dependencies even when the modules use TH. This is achieved by having the build actions depend on object files, rather than on shared or static libraries.

Right, so this achieves even more than this issue asks for, correct?

facundominguez commented 2 years ago

Right, so this achieves even more than this issue asks for, correct?

That is my impression. I think users would in general prefer to load object files with TH, and only depend on the object files if TH is involved. Both of these are done by the haskell_module implementation.

Edit: To be more explicit, depending on object files is preferable because linking requires all modules of a library to build in advance of linking them. While depending on object files, instead, only requires building the object files depended upon.