tweag / rules_haskell

Haskell rules for Bazel.
https://haskell.build
Apache License 2.0
266 stars 81 forks source link

haskell_library may cause extra rebuilds when ABI is not changed #1883

Open vkpgwt opened 1 year ago

vkpgwt commented 1 year ago

Describe the bug Modification of Haskell files used in a haskell_library causes rebuild of dependent libraries, even though the library API/ABI is not changed.

To Reproduce

  1. Run git clone https://github.com/tweag/rules_haskell.git
  2. Go to rules_haskell/tutorial directory.
  3. Put the following files into the current directory (shown below): A.hs, B.hs, and BUILD.bazel.
  4. Run bazel build :mylib2
  5. Update code in A.hs: change definition aConst = 1 to aConst = 2.
  6. Run bazel build :mylib2

Expected behavior I expect that only :mylib1 is rebuilt. But both libraries are actually rebuilt: we see warnings from the both Haskell modules. I suppose that step 5 does not change the ABI of mylib1.

Environment

Additional context If we compare MD5 sums of build files, we will see that no files related to mylib2 have their content modified. All modified files relate to mylib1 only.

If we build the attached libraries using stack, we will see that no modules of mylib2 are rebuilt (however, configure stage is performed). I guess that rules_haskell should behave in a similar way.

If we don't introduce implementation changes in A.hs, but only whitespace changes, only mylib1 is rebuilt. This is good, but not an interesting case, since no build products have their content changed.

My project suffers heavily from extra rebuilds by bazel, although most source code modifications relate to library implementations and do not change their ABI. Unfortunately, using haskell_module or gazelle is not an option for me.

Attached files are below.

A.hs:

{-# OPTIONS_GHC -Wmissing-signatures #-}
module A where
aConst = 1

B.hs:

{-# OPTIONS_GHC -Wmissing-signatures #-}
module B where
import A (aConst)
bConst = aConst + 1

BUILD.bazel:

haskell_toolchain_library(name = "base")

haskell_library(
    name = "mylib1",
    srcs = ["A.hs"],
    visibility = ["//visibility:public"],
    deps = [":base"],
)

haskell_library(
    name = "mylib2",
    srcs = ["B.hs"],
    visibility = ["//visibility:public"],
    deps = [":base", ":mylib1"],
)
aherrmann commented 1 year ago

Unfortunately, using haskell_module or gazelle is not an option for me.

What's preventing you from adopting these? As described in the corresponding blog post haskell_module enables recompilation avoidance.

vkpgwt commented 1 year ago

@aherrmann there are some points in the documentation of gazelle_haskell_modules that seem to be obstacles for adopting:

  1. gazelle_haskell_modules changes BUILD files, which are stored in git. Should we commit the changes?

    • suppose we do. Keeping frequently modified auto-generated text in git results in annoying merge conflicts, cluttered diffs, compilation errors caused by obsolete configuration, non-committed by mistake. Our project is large enough and is actively developed by many people, so that these troubles look unavoidable.

    • if we don't commit the changes, we should remove updates made by gazelle from BUILD-files before git commit. It's a distracting and error-prone activity, which will probably result in mistakenly committed data that shouldn't be committed - and, again, merge conflicts and build failures.

    It would be fixed, if gazelle could generate additional BUILD files, but not update existing ones. We would add these auto-generated files to .gitignore. Is it possible?

  2. it requires the developer to invoke bazel run //:gazelle_haskell_modules manually at all changes in module imports.

    • Is it fast enough to invoke it frequently in a large project (~200 BUILD files, ~700 .hs files)?

    • It looks inconvenient to run this command manually, since module imports are modified very often. Developers may forget to run gazelle, resulting in higher rate of build failures because of obsolete configuration, which is annoying. It would be easier to invoke gazelle before every ordinary bazel command, but it's cumbersome - the resulting command bazel run //:gazelle_haskell_modules && bazel build MYPACKAGE is rather long. I'd prefer to make //:gazelle_haskell_modules into a dependency of all packages in our project, so that a developer wouldn't be obliged to type it manually - is it possible? I guess it's not.

  3. We use custom bazel rules and macros, built on top of haskell_library, in order to keep common defaults (Haskell extensions, ghc options) and add auto-generated hlint tests, REPL targets etc. Is it possible to let gazelle_haskell_modules know about these rules, so that it could read and update them? It would be extremely inconvenient to migrate back to plain haskell_library.

Point 3 seems to be an absolute obstacle, other points are less critical, but they affect developer experience in a very bad way. This is why I'm thinking about fixing haskell_library - is it possible to fix it in a way similar to haskell_module (using ABI hashes etc)?

aherrmann commented 1 year ago

Re 1. Gazelle is indeed intended to be used in a way where you check-in the generated BUILD files. Gazelle is designed to be able to update the files in-place, leave alone parts that don't concern it, and respect manual edits when indicated, e.g. using # keep comments. The noise should be mostly the same noise you would get if you were to manually define haskell_module targets.

Re 2. On automation, there is autogazelle that does something like you describe. In terms of performance, from past experience it completed in about 3-6 seconds on a project with near a thousand modules.

Re 3. Gazelle has a builtin directive called map_kind to support this use-case. So, this shouldn't be a problem.