tweag / rules_haskell

Haskell rules for Bazel.
https://haskell.build
Apache License 2.0
266 stars 80 forks source link

Concurrency issues with `stack_snapshot()` #1167

Closed jcpetruzza closed 4 years ago

jcpetruzza commented 4 years ago

Describe the bug We are seeing build failures on CI, with an error message that looks as follows:

Command failed: /home/daniel/.cache/bazel/_bazel_daniel/526b0fb4c773181d605f6e65995536aa/external/haskell_stack/bin/stack --resolver /home/daniel/repos/habito-2/bazel/repo/stackage-snapshot.yaml ls dependencies --global-hints --separator=-
/home/daniel/.stack/pantry/global-hints-cache.yaml.tmp: renameFile:renamePath:rename: does not exist (No such file or directory)

It turns out this happens when two CI jobs are running simultaneously on the same box and it is likely due to stack ls dependencies not protecting certain operations with a lock.

Even though there is probably a bug in stack here, the fact that all bazel workspaces depend on artifacts under ~/.stack make isolation and even hermeticity arguably harder to enforce. Would it make sense to run stack with --stack-root pointing to a directory controlled by the stack_snapshot() rule?

To Reproduce I managed to reproduce the error locally by doing the following:

  1. rm -Rf ~/.stack/
  2. On two different copies of the bazel workspace, run simultaneously bazel clean --expunge; bazel build //...

I imagine that a relatively large number of stackage dependencies may be needed for the race condition to manifest.

Expected behavior Concurrent builds don't interfere with each other.

Environment

Additional context

At the moment we don't have a workaround for this, ideas are welcome!

thufschmitt commented 4 years ago

As a workaround, you can probably set STACK_ROOT to override the location of ~/.stack. I think this should be forwarded inside the repository rule calling stack

jcpetruzza commented 4 years ago

@regnat Setting STACK_ROOT does seem to work, thanks! I was somehow expecting it to be ignored...