tweag / rules_haskell

Haskell rules for Bazel.
https://haskell.build
Apache License 2.0
266 stars 81 forks source link

Document how `rules_haskell` caches artifacts #1293

Open joneshf opened 4 years ago

joneshf commented 4 years ago

Is your feature request related to a problem? Please describe.

I've found caching with rules_haskell to be hard to figure out. I'm using rules_haskell in a project I'm working on. I run locally on my Linux machine and (mostly due to inertia) on three CI services: AppVeyor (Windows), GitLab CI (Linux), and TravisCI (Linux and macOS). I use the GHC bindists, as setting up nix across all platforms is not really in the cards. In each environment, something seems to miss the cache. It happens in different ways whether it's locally or in one of the CI services.

Locally, it seems to intermittently check if the stack stuff is up to date. I can't figure out how it decides to check this (so can't produce output at the moment). But every so often, I'll see it checking for updates of stack stuff. It's not a big issue locally, but it means that sometimes a bazel test //... is subsecond and other times it's 10-20 seconds. Aside from this check, things seem to work locally.

On the CI services, I cannot figure out how to cache the rules_haskell stuff correctly. The project has a transitive dependency on happy, which means it has to be handled differently from other Haskell dependencies. Each CI service seems unable to cache the result of happy and has to build it every time it's run. This means that each build on CI is about five minutes, even if nothing changed.

I've tried caching the typical bazel directories: ~/.cache/bazel on Linux (and equivalents on macOS and Windows). I've tried using the --disk_cache and --repository_cache flags to set the locations explicitly. None of these things seem to make caching work on the CI services.

Describe the solution you'd like

There's two parts to this:

  1. It would be nice to document how to setup rules_haskell correctly so it doesn't intermittently check for stack updates. I imagine there's something I don't have setup properly, but I don't know what it could be.
  2. It would be nice to document how to setup rules_haskell correctly so it hits caches. Again, I imagine there's something I don't have setup properly, but don't know what it could be.

If there's some argument or flag that has to be turned on for either of these, maybe it could be flipped to be opt-out instead of opt-in so caching worked out of the box?

aherrmann commented 4 years ago

Locally, it seems to intermittently check if the stack stuff is up to date.

We call stack update as a local repository rule, i.e. this is run whenever Bazel re-fetches. This is to work around a race on a lock within stack. Slower refetch is an unfortunate side effect of this. However, it shouldn't invalidate the cache for Stackage dependencies that are already cached.

On the CI services, I cannot figure out how to cache the rules_haskell stuff correctly. The project has a transitive dependency on happy, which means it has to be handled differently from other Haskell dependencies. Each CI service seems unable to cache the result of happy and has to build it every time it's run. This means that each build on CI is about five minutes, even if nothing changed.

It's hard to say in general what's wrong here. There is a known reproducibility issue with haskell_cabal_binary|library, but this should not affect the caching within a CI pipeline, only across environments (e.g. shared remote cache between CI and devs). Are you aware of any changes between CI runs within one CI pipeline? E.g. different usernames, different working directories, different PATH, etc.? A good way to debug this is to compare execution logs of two runs that should be identical, following the steps described here.

I've tried caching the typical bazel directories: ~/.cache/bazel on Linux (and equivalents on macOS and Windows). I've tried using the --disk_cache and --repository_cache flags to set the locations explicitly. None of these things seem to make caching work on the CI services.

This suggests that something is changing in the environment and leaking into the cache keys. Temporary working or installation directories can have that effect, in particular if there are build steps that set use_default_shell_env = True, or repository rules that depend on PATH. In bindist mode that is the case for the POSIX and Python toolchains used by rules_haskell. Comparing execution logs as described above should help pinpoint this. Additionally Bazel allows to debug for inhermeticity in workspace rules.

I use the GHC bindists, as setting up nix across all platforms is not really in the cards.

It's possible to configure rules_haskell to work with Nix on Linux and MacOS while using the bindist on Windows. The rules_haskell repository itself does that. Nix makes it much easier to achieve reproducible builds and also the Bazel that comes with Nix includes some patches to improve reproducibility. It may well be worth the effort if the Linux and MacOS use-case allows for it.

joneshf commented 4 years ago

We call stack update as a local repository rule, i.e. this is run whenever Bazel re-fetches. This is to work around a race on a lock within stack. Slower refetch is an unfortunate side effect of this. However, it shouldn't invalidate the cache for Stackage dependencies that are already cached.

Yeah, it seems to hit the cache. Just didn't know for sure if it was checking because I did something wrong. Glad to know that's how it's supposed to work!

Thanks for the suggestions on where to go. Will look into them when I get some time to diag.

aherrmann commented 2 years ago

A caching section in the use-cases documentation would be a good place for this.