Closed stuhood closed 2 years ago
I'm somewhat coming around to this. Earlier, one of my counter-responses was I Just Don't Think there will be much metadata you have to set in BUILD files. For example, we can infer dependencies on resources via the embed directive!
What started to change my mind (before you even wrote this) was thinking about timeout
for tests. We can't infer that, it must be set manually. It's still very little metadata, but the "metadata should live close to where it defines" is persuasive to me and it's a big reason I like BUILD files in the first place.
Otherwise, in a single repo with a go.mod
at the root, why even have the BUILD file at all and not set the overrides in pants.toml
? That would be roughly the same effect. And I think that's a really bad idea to be setting per-directory metadata in pants.toml
, which implies it also is probably not the best idea to be doing it in <root>/BUILD
.
--
Btw, if we go with 1:1:1 and stop generating go_first_party_package
targets, then https://github.com/pantsbuild/pants/pull/13245 becomes irrelevant. We presumably need to update Terraform to also go back to 1:1:1, so we will have no instances of a target generating directory-based targets. Only generating file-based targets.
--
Another counterpoint that I two weeks ago was going to make until I realized it's not a very solid reason...We are relying on the assumption that first-party targets are created for a couple of things, including quickly finding the owning go.mod
target for go_first_party_package
. Before, we had to do a directory scan, whereas now we only look at the Address. We also use it to immediately find the import_path
, which is the key for dep inference.
What do you think about storing calculated values like that as a required field on the go_first_party_package
and having tailor
include the field? For now, those fields are import_path
, subpath
, and the owning go_mod
.
We don't have to decide what to do with each of those 3 fields...but what do you think about generally the idea of storing required fields like this on the target and relying on ./pants tailor
to set it + trusting the user to not bork it?
Another counterpoint that I two weeks ago was going to make until I realized it's not a very solid reason...We are relying on the assumption that first-party targets are created for a couple of things, including quickly finding the owning
go.mod
target forgo_first_party_package
. Before, we had to do a directory scan, whereas now we only look at the Address. We also use it to immediately find theimport_path
, which is the key for dep inference.What do you think about storing calculated values like that as a required field on the
go_first_party_package
and havingtailor
include the field? For now, those fields areimport_path
,subpath
, and the owninggo_mod
.
Is the import_path
essentially what we would otherwise get by stripping source_roots? Could it be computed from the directory structure if we defaulted to making go.mod
a source root marker file? Or is that only the subpath
?
The owning go_mod
seems like something that should be done with dependency inference, á la __init__.py
file lookups.
Is the import_path essentially what we would otherwise get by stripping source_roots?
We need to read the module
directive from the owning go.mod
and combine it with path relative to the go.mod. So no to source roots.
Or is that only the subpath?
Yes to source roots. But also we can simply compute this if we have the owning go.mod's path.
The owning go_mod seems like something that should be done with dependency inference, á la init.py file lookups.
That's how we used to do it, but target generation allowed us to simplify.
Would we prefer to store the owning go_mod on the target as a required field, or use inference? I think I might prefer the required field because it reduces the risk of manually creating a target without an owning go.mod
? The error will be apparent as soon as you write the target definition.
Would we prefer to store the owning go_mod on the target as a required field, or use inference? I think I might prefer the required field because it reduces the risk of manually creating a target without an owning
go.mod
? The error will be apparent as soon as you write the target definition.
Maybe, but it seems very easily inferrable. It seems like the rough equivalent in Python land would be generating a target for an __init__.py
file and explicitly adding dependencies on them to targets they impacted. It's clear from the directory structure what should happen there, so we infer it instead.
It seems like the rough equivalent in Python land would be generating a target for an init.py file and explicitly adding dependencies on them to targets they impacted.
I think the difference is that Python does not require an owning __init__.py
, whereas we do require an owning go.mod
. Further, it's a Pants requirement that you have a go.mod
-- fwict you could do some things like run gofmt
without a go.mod
if you weren't using Pants.
Not sure if that actually implies anything particular, though.
I continue to get more on board with this. While writing the example-golang
repository, I'm adding little comments to the BUILD files explaining what each target does. It will be much more natural to explain timeout=
directly on a go_first_party_package
target than explaining how go_mod
generates targets and then how to set overrides
.
If we expect most users to not know / not care what a target is, having to explain "target generators" probably will go over peoples's heads. Now, with file-based languages like Python, that's an unfortunate reality because it's simply Too Verbose to have one explicit target per file. But with Go, one target per-directory seems more acceptable.
Any thoughts @tdyas @benjyw @asherf?
If we go with 1:1:1, let's rename go_first_party_package
back to go_package
. "first party" is redundant since go_third_party_package
is generated and generally invisible to users.
I think that if we need a BUILD file, it makes sense for it to be next to the code it describes, for the reasons Stu lists.
The future I want to get to is where you don't need a BUILD file at all if it was going to be 100% boilerplate (e.g., just a python_sources()
. The "generator high up the tree" mechanism was for me a way to get to that in practice, but if it's not a good fit then we don't have to, and we can hold out for a future mechanism that does away with the necessity of BUILD files.
In other words, I'm fine with all this because it doesn't inhibit the future I hope for.
Okay. I'm convinced enough to start making this change in Pants 2.9 next week
I'm not sure I agree with Tom's recommendation to rename go_first_party_package to go_package. Because we expect tailor to write the BUILD files, I don't think the extra 11 chars is that big of a deal. Personally I find the extra clarity worth it. I find go_package and go_third_party_package not as intuitive. Any opinion @benjyw @stuhood?
I'm not sure I agree with Tom's recommendation to rename go_first_party_package to go_package. Because we expect tailor to write the BUILD files, I don't think the extra 11 chars is that big of a deal. Personally I find the extra clarity worth it. I find go_package and go_third_party_package not as intuitive. Any opinion @benjyw @stuhood?
What is the perceived ambiguity with go_package
?
It doesn't make clear it's talking about your own first-party code. Package is a generic term in Go that applies to third-party code in the first place.
Is the motivation to rename to go_package boilerplate reduction? Or something else too?
It doesn't make clear it's talking about your own first-party code. Package is a generic term in Go that applies to third-party code in the first place.
Is the motivation to rename to go_package boilerplate reduction? Or something else too?
The main goal is to not introduce unnecessary description that can confuse users. In my view, Pants seems to have a core UX principle that targets in a BUILD file generally refer to sources in that directory. At the very least, that is what I believe we want to teach users with our UX. Thus, Pants users should understand targets generally are referring to code in the repo (which is by definition first-party code).
We don't seem to find a need to state that fact elsewhere; for example. we don't call python_sources
python_first_party_sources
, so why start making that distinction with Go? Given go_third_party_package
needs to be auto-generated by the go_mod
target, users will not be instantiating that target type ever and so naming the first-party target go_first_party_package
is unnecessary.
As Pants developers, we have the privileged view to know that go_third_party_package
exists as an implementation detail. I submit that users will be unaware of it (or at most tacitly aware of it) since they do not have our view. Thus, it will not be part of their normal experience with Pants and there is no need to imply it exists by naming something "first party."
Looking back, I believe that I originally consented to the go_package
-> go_first_party_package
name change because the targets would be auto-generated and it would not have to be part of users' day-to-day experience (although I approved the PR without comment). Since we are reintroducing it as an explicit target, I still have these concerns about the "first party" description being unnecessary.
Pants has recommended per-directory targets (aka
1:1:1
) for a long time, although in the past it had a fairly different motivation: https://v1.pantsbuild.org/build_files.html#target-granularityThe most accurate motivation in our current world of per-file behavior is "keeping metadata close in the filesystem to the things it applies to". This applies universally across languages, and results in the shortest possible definitions for metadata when you do need to apply it, because you can take advantage of the location of the
BUILD
file to avoid needing to spell out long file paths inBUILD
files.Although
go
currently does not support:files
/resources
,lint
orfmt
skips... all of these will be package (i.e. directory) specific.
Without
1:1:1
, all of this metadata would likely end up asoverrides
on ago.mod
target at the root of your repository. And because of the potential distance to the files that it is being applied to, it's likely to be quite verbose: see examples on #13187. Centralizing the definition of an entire directory tree would also increase the chance of merge conflicts, and make it more difficult to apply ownership/permissions for code review (which is almost always directory/file based).Additionally, the distance between the metadata definition and the relevant file/directory also has an interesting effect on address/specs syntax. Both the
#
syntax (which is necessary and a good idea for thirdparty definitions, to signal that they do not live in the filesystem's namespace) and the "file address" syntax (${file}:${relative_target}
), have annoying quirks when metadata is not near a file/directory:#
syntax, the trouble is the hash sign itself, as well as the issue that if something is declared at the root, it must have a name. This leads to addresses like//:mod#./some/subdir
orsrc/go:mod#./subdir
... the former of which is more verbose than necessary, and the latter of which isn't copy-pasteable/grokkable as an actual file/directory. They also don't correspond at all to theSpecs
syntax that you would use to match them with a glob (although exact copy-pastes should work).go_mod
at the root of the repository, an address might look like:some/subdir/:../../mod
(i.e. it gets longer the further away from the metadata you are, although absolute target syntax is possible). But when metadata lives in the directory, an address would just be the directory name:src/go/subdir/
. And there is good alignment withSpecs
, since the portion of the spec that someone is likely to type by hand (and/or tab-complete) matches the address.Finally: per-directory target definitions are also the most likely to be compatible with any sort of "hierarchical overrides" design that we might be able to come up with in the future. For example: if we can come up with a design for merging of unambiguous metadata based on filesystem distance (i.e.: metadata in a parent directory is overridden by metadata in a child directory), then actually having a public
go_package
target (rather than having it be private and generated by ago_mod
target) would be necessary... since declaring ago_mod
deeper in the filesystem would be semantically strange (nogo.mod
file).In short: while I think that target generators are an incredibly powerful feature which unify generating targets for files within a directory with generating targets for thirdparty definitions, I think that we might have gone overboard to use them to generate targets living in other directories.