Per-directory target definitions for Go (aka: `1:1:1`)

stuhood commented 2 years ago

Pants has recommended per-directory targets (aka 1:1:1) for a long time, although in the past it had a fairly different motivation: https://v1.pantsbuild.org/build_files.html#target-granularity

The most accurate motivation in our current world of per-file behavior is "keeping metadata close in the filesystem to the things it applies to". This applies universally across languages, and results in the shortest possible definitions for metadata when you do need to apply it, because you can take advantage of the location of the BUILD file to avoid needing to spell out long file paths in BUILD files.

Although go currently does not support:

files/resources,
package-specific parameters to tests (timeouts, etc)
lint or fmt skips
Additional arguments needed to build particular C / header files

... all of these will be package (i.e. directory) specific.

Without 1:1:1, all of this metadata would likely end up as overrides on a go.mod target at the root of your repository. And because of the potential distance to the files that it is being applied to, it's likely to be quite verbose: see examples on #13187. Centralizing the definition of an entire directory tree would also increase the chance of merge conflicts, and make it more difficult to apply ownership/permissions for code review (which is almost always directory/file based).

Additionally, the distance between the metadata definition and the relevant file/directory also has an interesting effect on address/specs syntax. Both the # syntax (which is necessary and a good idea for thirdparty definitions, to signal that they do not live in the filesystem's namespace) and the "file address" syntax (${file}:${relative_target}), have annoying quirks when metadata is not near a file/directory:

For the # syntax, the trouble is the hash sign itself, as well as the issue that if something is declared at the root, it must have a name. This leads to addresses like //:mod#./some/subdir or src/go:mod#./subdir... the former of which is more verbose than necessary, and the latter of which isn't copy-pasteable/grokkable as an actual file/directory. They also don't correspond at all to the Specs syntax that you would use to match them with a glob (although exact copy-pastes should work).
For the file-address syntax (with support for directories: see #13245), metadata living a long distance from the affected path makes the target component of the address longer: so for a go_mod at the root of the repository, an address might look like: some/subdir/:../../mod (i.e. it gets longer the further away from the metadata you are, although absolute target syntax is possible). But when metadata lives in the directory, an address would just be the directory name: src/go/subdir/. And there is good alignment with Specs, since the portion of the spec that someone is likely to type by hand (and/or tab-complete) matches the address.

Finally: per-directory target definitions are also the most likely to be compatible with any sort of "hierarchical overrides" design that we might be able to come up with in the future. For example: if we can come up with a design for merging of unambiguous metadata based on filesystem distance (i.e.: metadata in a parent directory is overridden by metadata in a child directory), then actually having a public go_package target (rather than having it be private and generated by a go_mod target) would be necessary... since declaring a go_mod deeper in the filesystem would be semantically strange (no go.mod file).

In short: while I think that target generators are an incredibly powerful feature which unify generating targets for files within a directory with generating targets for thirdparty definitions, I think that we might have gone overboard to use them to generate targets living in other directories.

Eric-Arellano commented 2 years ago

I'm somewhat coming around to this. Earlier, one of my counter-responses was I Just Don't Think there will be much metadata you have to set in BUILD files. For example, we can infer dependencies on resources via the embed directive!

What started to change my mind (before you even wrote this) was thinking about timeout for tests. We can't infer that, it must be set manually. It's still very little metadata, but the "metadata should live close to where it defines" is persuasive to me and it's a big reason I like BUILD files in the first place.

Otherwise, in a single repo with a go.mod at the root, why even have the BUILD file at all and not set the overrides in pants.toml? That would be roughly the same effect. And I think that's a really bad idea to be setting per-directory metadata in pants.toml, which implies it also is probably not the best idea to be doing it in <root>/BUILD.

--

Btw, if we go with 1:1:1 and stop generating go_first_party_package targets, then https://github.com/pantsbuild/pants/pull/13245 becomes irrelevant. We presumably need to update Terraform to also go back to 1:1:1, so we will have no instances of a target generating directory-based targets. Only generating file-based targets.

--

Another counterpoint that I two weeks ago was going to make until I realized it's not a very solid reason...We are relying on the assumption that first-party targets are created for a couple of things, including quickly finding the owning go.mod target for go_first_party_package. Before, we had to do a directory scan, whereas now we only look at the Address. We also use it to immediately find the import_path, which is the key for dep inference.

What do you think about storing calculated values like that as a required field on the go_first_party_package and having tailor include the field? For now, those fields are import_path, subpath, and the owning go_mod.

We don't have to decide what to do with each of those 3 fields...but what do you think about generally the idea of storing required fields like this on the target and relying on ./pants tailor to set it + trusting the user to not bork it?

stuhood commented 2 years ago

Another counterpoint that I two weeks ago was going to make until I realized it's not a very solid reason...We are relying on the assumption that first-party targets are created for a couple of things, including quickly finding the owning go.mod target for go_first_party_package. Before, we had to do a directory scan, whereas now we only look at the Address. We also use it to immediately find the import_path, which is the key for dep inference.

What do you think about storing calculated values like that as a required field on the go_first_party_package and having tailor include the field? For now, those fields are import_path, subpath, and the owning go_mod.

Is the import_path essentially what we would otherwise get by stripping source_roots? Could it be computed from the directory structure if we defaulted to making go.mod a source root marker file? Or is that only the subpath?

The owning go_mod seems like something that should be done with dependency inference, á la __init__.py file lookups.

Eric-Arellano commented 2 years ago

Is the import_path essentially what we would otherwise get by stripping source_roots?

We need to read the module directive from the owning go.mod and combine it with path relative to the go.mod. So no to source roots.

Or is that only the subpath?

Yes to source roots. But also we can simply compute this if we have the owning go.mod's path.

The owning go_mod seems like something that should be done with dependency inference, á la init.py file lookups.

That's how we used to do it, but target generation allowed us to simplify.

Would we prefer to store the owning go_mod on the target as a required field, or use inference? I think I might prefer the required field because it reduces the risk of manually creating a target without an owning go.mod? The error will be apparent as soon as you write the target definition.

stuhood commented 2 years ago

Would we prefer to store the owning go_mod on the target as a required field, or use inference? I think I might prefer the required field because it reduces the risk of manually creating a target without an owning go.mod? The error will be apparent as soon as you write the target definition.

Maybe, but it seems very easily inferrable. It seems like the rough equivalent in Python land would be generating a target for an __init__.py file and explicitly adding dependencies on them to targets they impacted. It's clear from the directory structure what should happen there, so we infer it instead.

Eric-Arellano commented 2 years ago

It seems like the rough equivalent in Python land would be generating a target for an init.py file and explicitly adding dependencies on them to targets they impacted.

I think the difference is that Python does not require an owning __init__.py, whereas we do require an owning go.mod. Further, it's a Pants requirement that you have a go.mod -- fwict you could do some things like run gofmt without a go.mod if you weren't using Pants.

Not sure if that actually implies anything particular, though.

Eric-Arellano commented 2 years ago

I continue to get more on board with this. While writing the example-golang repository, I'm adding little comments to the BUILD files explaining what each target does. It will be much more natural to explain timeout= directly on a go_first_party_package target than explaining how go_mod generates targets and then how to set overrides.

If we expect most users to not know / not care what a target is, having to explain "target generators" probably will go over peoples's heads. Now, with file-based languages like Python, that's an unfortunate reality because it's simply Too Verbose to have one explicit target per file. But with Go, one target per-directory seems more acceptable.

Any thoughts @tdyas @benjyw @asherf?

tdyas commented 2 years ago

If we go with 1:1:1, let's rename go_first_party_package back to go_package. "first party" is redundant since go_third_party_package is generated and generally invisible to users.

benjyw commented 2 years ago

I think that if we need a BUILD file, it makes sense for it to be next to the code it describes, for the reasons Stu lists.

The future I want to get to is where you don't need a BUILD file at all if it was going to be 100% boilerplate (e.g., just a python_sources(). The "generator high up the tree" mechanism was for me a way to get to that in practice, but if it's not a good fit then we don't have to, and we can hold out for a future mechanism that does away with the necessity of BUILD files.

In other words, I'm fine with all this because it doesn't inhibit the future I hope for.

Eric-Arellano commented 2 years ago

Okay. I'm convinced enough to start making this change in Pants 2.9 next week

I'm not sure I agree with Tom's recommendation to rename go_first_party_package to go_package. Because we expect tailor to write the BUILD files, I don't think the extra 11 chars is that big of a deal. Personally I find the extra clarity worth it. I find go_package and go_third_party_package not as intuitive. Any opinion @benjyw @stuhood?

tdyas commented 2 years ago

I'm not sure I agree with Tom's recommendation to rename go_first_party_package to go_package. Because we expect tailor to write the BUILD files, I don't think the extra 11 chars is that big of a deal. Personally I find the extra clarity worth it. I find go_package and go_third_party_package not as intuitive. Any opinion @benjyw @stuhood?

What is the perceived ambiguity with go_package?

Eric-Arellano commented 2 years ago

It doesn't make clear it's talking about your own first-party code. Package is a generic term in Go that applies to third-party code in the first place.

Is the motivation to rename to go_package boilerplate reduction? Or something else too?

tdyas commented 2 years ago

It doesn't make clear it's talking about your own first-party code. Package is a generic term in Go that applies to third-party code in the first place.

Is the motivation to rename to go_package boilerplate reduction? Or something else too?

The main goal is to not introduce unnecessary description that can confuse users. In my view, Pants seems to have a core UX principle that targets in a BUILD file generally refer to sources in that directory. At the very least, that is what I believe we want to teach users with our UX. Thus, Pants users should understand targets generally are referring to code in the repo (which is by definition first-party code).

We don't seem to find a need to state that fact elsewhere; for example. we don't call python_sources python_first_party_sources, so why start making that distinction with Go? Given go_third_party_package needs to be auto-generated by the go_mod target, users will not be instantiating that target type ever and so naming the first-party target go_first_party_package is unnecessary.

As Pants developers, we have the privileged view to know that go_third_party_package exists as an implementation detail. I submit that users will be unaware of it (or at most tacitly aware of it) since they do not have our view. Thus, it will not be part of their normal experience with Pants and there is no need to imply it exists by naming something "first party."

Looking back, I believe that I originally consented to the go_package -> go_first_party_package name change because the targets would be auto-generated and it would not have to be part of users' day-to-day experience (although I approved the PR without comment). Since we are reintroducing it as an explicit target, I still have these concerns about the "first party" description being unnecessary.

pantsbuild / pants

Per-directory target definitions for Go (aka: `1:1:1`) #13488