Closed ghorn closed 2 years ago
@mboes this is the module path question I was asking earlier today
Good question. It appears that there is no consensus among rule authors about how to handle imports. While Python and C++ rules are consistent with each other on this point, the native Java rules are not. Like the native Java rules, rules_haskell makes imports package relative, rather than workspace relative. Whereas we would have followed the C++/Python precedent if it was unique and if technically feasible, it appears that this precedent is not unique (Java and rules_go do it differently). There were possibly technical concerns too, like the deterministic mapping from source file names to module names required in previous versions, as well as the -i
flags, which I believe we no longer need to pass (cc @aherrmann and @judah).
Folder hierarchies are very deep in all Java projects (src/main/java/com/example/...
), and deeper still in Bazel Java projects. This apparently isn't much of a problem. If it bugs you in the Haskell case though, there is a workaround. A little known fact about GHC is that it doesn't care what you name your source files and where you put them. So you need not put module FooCorp.Bar.Baz
in //bar:FooCorp/Bar/Baz.hs
. You could put it in //bar:Baz.hs
by convention and it would work just as well (or indeed //bar:Quux.hs
if you wanted to pull a prank). Of course, the flip side is that sneaky teammates could make file naming even more inconsistent at your company if they really tried to, and GHC wouldn't complain.
We could do a better job documenting this.
Our team at Google has discussed this topic quite a bit recently for our internal rules (an old version of which is open-sourced at google/cabal2bazel) which we're using in a large monorepo with many languages.
We recently adopted the following convention: module names should be the "longest capitalized suffix" of the filename. For example, "my/project/Foo/Bar.hs" would have the module name "Foo.Bar". (More precisely: the longest suffix of path components starting with a capitalized letter.) It works well when you have a cohesive Haskell codebase living in a small number of subdirectories of a larger project ("//my/project" ). It also reflects how open-source Haskell projects are often organized.
This approach gives us back a deterministic mapping from filepaths to module names, which helps simplify some of our internal tooling. We also made our build rules enforce this convention, but it can still be overridden for third-party packages. (It's also not required for "Main" modules.)
Going one step further, we experimented with a more radical idea using the PackageImports extension: to import "my/project/Foo/Bar.hs", write:
import "//my/project" Foo.Bar
Where, as before, the split between the package name prefix and the hierarchical modules comes from the longest-capitalized-suffix rule. (Internally, we've already patched GHC so we can avoid mangling labels in package names and improve error messages. I have it on my radar to make a GHC proposal/patch for that.)
However, we ended up dropping the PackageImports approach for now. It would be a significant churn to our codebase, and our code is concentrated enough that the "longest-suffix" change gets us most of the way there already.
The PackageImports
idea is very cool, although it exposes build system structure to source code. Out of curiosity, how did you get it so that even the slashes of a label name are allowed in the package identifier?
@mboes This is our patch: https://gist.github.com/judah/9f0e657cd9d057fc463e37834c712251
I hacked the parsing logic so that strings beginning with "//"
are parsed like labels, and anything else is treated like a regular Haskell package name.
I'm not sure that patch itself is ready for merging upstream since it changes Cabal (the-library), not just GHC. I needed to change Cabal too because ghc-pkg register
uses it to parse the *.conf
format.
An advantage of not enforcing a particular mapping between paths in the workspace and module names is compatibility with existing Cabal packages. There are a number of packages that follow conflicting conventions or are inconsistent within themselves. In the past, this compatibility was a requirement, when we were using Hazel to build Hackage packages. Now that we have haskell_cabal_library|binary
it is still a useful feature, either to port a Cabal package to Bazel (haskell_library|binary
), or to work around issues with Cabal, see e.g. https://github.com/tweag/rules_haskell/issues/1154.
As @mboes points out GHC itself does not enforce a strict mapping between file paths and module names. I tested the proposed module paths relative to the root and this is already possible. (@ghorn please let us know if you've encountered any issues with this strategy.) I'd be hesitant to enforce additional restrictions. Already in this thread we see multiple different conventions being used. I'm concerned that enforcing a particular convention in rules_haskell will inflict large cost on many of our users and I don't see a strong benefit outweighing the cost.
Can we document this behavior? I’m not sure I get how/why the absolute module names with a build file in the hs file directory works.
A little known fact about GHC is that it doesn't care what you name your source files and where you put them. So you need not put module
FooCorp.Bar.Baz
in//bar:FooCorp/Bar/Baz.hs
. You could put it in//bar:Baz.hs
by convention and it would work just as well (or indeed//bar:Quux.hs
if you wanted to pull a prank).
I always thought GHC enforces the module structure to correspond with the directory structure exactly?
@profpatsch the directory structure is immaterial to GHC.
https://github.com/tweag/rules_haskell/pull/1281 has been merged. This implementation allows module paths to relative to the root or the package or some other prefix, see the PR description for details.
We use C++, Python, and Haskell in our bazel project. C++ and Python imports are relative to the project root (where the WORKSPACE file lives). Haskell module imports are relative to the library package (where the relevant BUILD.bazel file lives) plus a global prefix for the project name. I think that haskell should be consistent with python and C++, or should have a good reason why not and have guidance on best practices for dealing with this inconsistency.
Here's a minimal example to clarify. When I have a project with structure like:
If you use these libraries as dependencies, in Python you would import with the absolute path (optionally prefaced with
my_project
):In C++ you would include with the absolute path:
In Haskell you import relative to the package (where
BUILD.bazel
lives relative to the haskell modules):We then have to add more directories so that modules are namespaced consistently. For example if we want to import the file in the example above as
import My_project.Some_haskell_library.Baz
, then we have to put the file inmy_project/some_haskell_library/My_project/Some_haskell_library/Baz.hs
. In our company we end up having directories and module imports be either being very deep and redundant, or short and inconsistent.I'm looking for guidance and discussion here. I'm not sure what the right solution is. It's probably worth looking to see how other compiled languages with modules handle this problem. Java is a first class language with native rules, it would be worth looking to see how they handle it.
If this should be
WONTFIX
then providing some guidance on best practices would be helpful.