tweag / rules_haskell

Haskell rules for Bazel.
https://haskell.build
Apache License 2.0
266 stars 81 forks source link

User interface for fully statically linked Haskell binaries #1408

Open aherrmann opened 4 years ago

aherrmann commented 4 years ago

1390 added support for fully statically linked Haskell binaries. As of that implementation users need to configure the GHC toolchain for fully static linking by setting static_runtime = True and fully_static_link = True on the toolchain and they need to configure compiler_flags = ["-optl-static"] on the haskell_binary targets that they want to have fully statically linked. @mboes raised whether this flag is necessary and whether we could build on Bazel precedent and use the fully_static_link feature flag instead. I figured it's best to have that discussion on this issue tracker.

A related question was previously discussed to decide on the interface of the GHC toolchain, see https://github.com/tweag/rules_haskell/pull/1390#issuecomment-658877757. As I argued in that discussion it is a property of the GHC distribution whether it is capable of fully static linking, so it makes sense to have a corresponding switch on the GHC toolchain rule. The remaining question is whether and how users control fully static linking on individual targets. These are the options I see:

  1. Always link fully statically when the toolchain is configured with fully_static_link = True. I.e. always pass -optl-static to GHC (or --ghc-option=-optl-static to Cabal).
  2. Let the user configure fully static linking per target using Bazel's fully_static_link feature flag.
  3. Let the user configure fully static linking per target by passing -optl-static to GHC (or --ghc-option=-optl-static to Cabal).

Here's my view on each of these options:

  1. I think this would be the ideal option if it was feasible, but I don't think that it is feasible in practice at this point. E.g. I tried this approach on minirepo but noticed that it then fails to build @ghcide-exe//ghcide. The reason is that it depends on libtinfo of ncurses which is only available dynamically in the static-haskell-nix configuration. Trying to switch to a static libtinfo fails due to missing symbols such as __memcpy_chk. IIUC this is means that libtinfo depends on glibc and is incompatible with musl. This indicates that it is necessary to give users per target control of whether they want to link fully statically or not and rules this option out.
  2. The advantage of this approach is that it follows Bazel precedent and can be configured per package or even globally on the command-line. To allow users to enable fully statically linking particular binaries of stack_snapshot we'd need to add a new attribute to enable forwarding feature flags to the generated targets (that's something that we may need at some point anyway). Note, as pointed out by @lunaris there are issues with the fully_static_link feature flag on cc_binary. However, these shouldn't affect a Haskell binary with that feature flag enabled.
  3. This is the status-quo. To allow users to enable fully statically linking particular binaries of stack_snapshot we'd need to add a new attribute to enable forwarding cabalopts to the generated targets (that's also something that we may need at some point anyway). The advantage of this approach is that it's very simple and straight-forward. However, it doesn't allow for easy global or per package configuration as features does.

I don't see a clear winner between 2 and 3. But, 2 seems preferable as it follows Bazel precedent and allows for convenient global or per package configuration.

@mboes @lunaris WDYT?

lunaris commented 4 years ago

I think option 2 sounds the most attractive -- effectively swapping -optl-static under compiler_flags for fully_static_link as an argument to haskell_binary, right? With the added bonus that if static linking with GHC changes, we can change it behind fully_static_link without breaking existing code.

Playing devil's advocate, I do wonder if there are cases where the static linking of a particular binary really is specific to that target -- e.g. suppose I could make ghcide work with some specific set of compiler_flags; it's unlikely that fully_static_link would ever be general enough to work, right? In practice this feels unlikely right now (and indeed, we could still fix up those targets with one-off compiler_flags), but I thought I'd raise it in case someone knows better.

With regards to e.g. stack_snapshot, another option is potentially modifying components to take a dict of dicts (I know, getting a bit hairy), or a dict of structs, where the structs have some limited API such as:

stack_snapshot(
    ...
    components = {
        "ghcide": struct(
            build = True, # Replacing/making more verbose the existing feature
            fully_static_link = True, # Giving us the new functionality
        ),
    },
    ...
)

I don't know whether this is possible/sensible (and indeed you might still want to pass feature flags down regardless of this), but it's a thought I had. Perhaps overly granular for the current use case.

TL;DR -- 2 sounds good, happy to help :smile:

aherrmann commented 4 years ago

effectively swapping -optl-static under compiler_flags for fully_static_link as an argument to haskell_binary, right?

Yes, for features = ["fully_static_link"] to be precise.

Playing devil's advocate, I do wonder if there are cases where the static linking of a particular binary really is specific to that target

One can probably always construct a case where dynamic linking is required. E.g. the glibc FAQ points out how it's used for configuration. Though, I'm more concerned with cases where someone wants to link their binary fully statically but don't particularly care about whether build tools such as e.g. happy or alex are linked fully statically. happy and alex do link fully statically just fine with static-haskell-nix, but it's easy to image a build tool that is harder to get to link fully statically - as ghcide illustrates. The concern with always globally switching to fully static linking is that such tools would now stand in a user's way if all they want is to fully statically link their own binary.

With regards to e.g. stack_snapshot, another option is potentially modifying components to take a dict of dicts (I know, getting a bit hairy), or a dict of structs [...]

That's an interesting idea. There is precedent for something like this in maven_install which allows to list dependencies either as just strings representing maven coordinates, or as structs providing more detailed information. I think we could do a similar thing and not attach this to the components attribute but instead extend the packages attribute directly, replacing other special purpose attributes. E.g.

packages = [
    "alex",
    stack.package(
        name = "ghcide",
        lib = True,
        exe = ["ghcide"],
        features = ["fully_static_link"],
    )
]

Implementation wise Bazel doesn't have attribute types for a list of structs directly, so the stack_snapshot macro would have to do some mangling to map this to regular Bazel attributes on the _stack_snapshot rule. But, that's an implementation detail.

mboes commented 4 years ago

The packages as structs idea is tempting (and cute), but doesn't that point towards writing the same metadata in a new top-level target instead? That would feel more Bazely.

aherrmann commented 4 years ago

The packages as structs idea is tempting (and cute), but doesn't that point towards writing the same metadata in a new top-level target instead? That would feel more Bazely.

@mboes Could you expand on what that would look like? stack_snapshot is a repository rule and some of these attributes are used to generate targets, e.g. lib and exe will influence what targets are generated. I don't see how this could be done with additional Bazel targets since repository rules don't have access to providers of targets.

mboes commented 4 years ago

Repository rules don't have providers, but they can read files. So we could have a stack_package rule that just writes the attributes to disk, as JSON using to_json() on a struct. Bazel doesn't have a native from_json() unfortunately. But that's an issue affecting other repository rule writers and they are all using the same workaround. See https://github.com/bazelbuild/bazel/issues/3732.

aherrmann commented 4 years ago

Repository rules don't have providers, but they can read files.

That's true, but they cannot read files that are generated by build actions because of the different phases in which repository rules and regular rules are executed.

I suppose we could make it another repository rule. Here's what a WORKSPACE might then look like based on my understanding of the idea

stack_package(
    name = "ghcide",
    lib = True,
    exe = ["ghcide"],
    features = ["fully_static_link"],
)
stack_snapshot(
    name = "stackage",
    packages = [
        "alex",
        "@ghcide",
    ],
)

However, I see a couple of issues with this approach.

  1. We're introducing a bunch of extra external workspaces and disk I/O. That seems like a lot of overhead compared to the struct approach.
  2. How do we distinguish simple packages like "alex" from those with additional attributes like "@ghcide"? We could
    • Use a heuristic. stack_package repositories will start with an @ while plain package names will not. However, in principle users could point to a plain JSON file by filename, e.g. "ghcide.json", so there is some ambiguity in this approach.
    • Remove simple packages and have a stack_package repository for every package. I think that's too much boilerplate.
    • Have two separate attributes packages for things like "alex", and detailed_packages (or some other name) for things like "@ghcide". Seems a bit cumbersome, but seems more robust than the heuristic.
  3. I don't think I've seen repository rules used like that before, so I don't think this follows common Bazel concepts or precedent.
mboes commented 4 years ago

Yes, making stack_package a repository rule was what I was thinking. I doubt we should be too worried about extra disk I/O: the number of packages is small that need options will be small, even in a large repository. It's true that packages strings should either be names or labels, but having them be both is messy.

But come to think of it, do we really want to push so much configuration in the WORKSPACE file? In the example you gave about ghcide, AFAIU all options can be configured from a custom snapshot.yaml file, which is what we've been recommending for complicated setups. It's just that in snapshot.yaml, there's nothing more ergonomic than ghc-options: -optl-static to get a static executable. Until https://github.com/commercialhaskell/stack/issues/3420 gets fixed. It doesn't sound like a huge deal, all the more so because the executables from Stackage are not usually the ones we want to deploy.

Either way, I'm not against structs in packages. There is the maven_install precedent after all.

aherrmann commented 4 years ago

Configuring this in snapshot.yaml would be great. Unfortunately, we don't have access to flags or GHC options defined in the stack snapshot, stack ls dependencies json doesn't expose these.

lunaris commented 4 years ago

Moreover, it feels like we'd lose the original use case -- specifying per-package build/configuration options. I don't think one can do that in stack.yaml, right? (It's been a long time since I wrote one in anger so perhaps this support is there now, but a glance at the documentation fails me if so). Personally I think more WORKSPACE rules is a bit of a faff -- I already have to set up quite a bit just to get to stack_snapshot, it'd be nice if I could be done by that point. Moreover, there's an inherent static-ness to external repositories that doesn't seem to be warranted for this use case? Just my 2c.

aherrmann commented 4 years ago

With custom stack snapshots it's possible to define per package flags and GHC options. The docs give the following examples:

# Override flags, can also override flags in the parent snapshot
flags:
  unordered-containers:
    debug: true

# Set GHC options for specific packages
ghc-options:
  warp:
  - -O2

Unfortunately, these are not exposed by stack ls dependencies json.

mboes commented 4 years ago

Hm, do we have a feature request upstream? As things stand, using a snapshot file is going to be pretty confusing to users: part of the content will be ignored when Stack wouldn't have.

aherrmann commented 4 years ago

Hm, do we have a feature request upstream?

I've opened https://github.com/commercialhaskell/stack/issues/5372 to that end.

As things stand, using a snapshot file is going to be pretty confusing to users: part of the content will be ignored when Stack wouldn't have.

Indeed, we have a related issue on rules_haskell: https://github.com/tweag/rules_haskell/issues/1152