pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.31k stars 636 forks source link

Add a global option to handle having recursive globs in target generators sources field #20450

Open AlexTereshenkov opened 9 months ago

AlexTereshenkov commented 9 months ago

Is your feature request related to a problem? Please describe. Pants supports providing sources either as individual files or as patterns; one can use * for globs over just the current working directory, ** for recursive globs over everything below (at any level the current working directory), with prefix ! for ignores.

The best practice, however, is to follow the 1:1:1 principle: metadata about your code should live near the code itself. For this reason, I think it may be helpful to enforce this via a global option so that users will be able to forbid using recursive globs in sources fields (or elsewhere where the files ownership is declared) similarly to how we handle globs in BUILD files that do not match files on disk (ignore/warn/fail).

Describe the solution you'd like Having this directory with source code, e.g.

$ tree foo             
foo
├── bar
│   ├── __init__.py
│   ├── mod3.py
│   └── mod4.py
├── baz
│   ├── __init__.py
│   ├── mod1.py
│   └── mod2.py
├── BUILD
└── __init__.py

3 directories, 8 files

with

$ cat foo/BUILD 
python_sources(sources=["**"])

running

$ pants list foo::
//src/python/pants/foo/__init__.py:../../../../all-__init__.py-files
//src/python/pants/foo/bar/__init__.py:../../../../../all-__init__.py-files
//src/python/pants/foo/baz/__init__.py:../../../../../all-__init__.py-files
src/python/pants/foo:foo
src/python/pants/foo/BUILD
src/python/pants/foo/__init__.py
src/python/pants/foo/bar/__init__.py:../foo
src/python/pants/foo/bar/mod3.py:../foo
src/python/pants/foo/bar/mod4.py:../foo
src/python/pants/foo/baz/__init__.py:../foo
src/python/pants/foo/baz/mod1.py:../foo
src/python/pants/foo/baz/mod2.py:../foo

yields all the files in the directory tree, recursively. With the global option to control whether using recursive globs is permitted, one should be able to raise an error or warn about it.

Describe alternatives you've considered Currently, users who would like to control whether recursive globs are used would need to use some kind of semantic grep / AST tools to identify recursive globs usage which may be a lot of work.

Additional context I can see a use case where there's a directory with lots of inner directories containing some non-code resources and one wouldn't want to create individual BUILD files inside them, particularly, if the directories are added/removed all the time. See https://github.com/pantsbuild/pants/issues/18217 for illustration. So one may want to allow using a recursive pattern on some targets, but not all of them (e.g. allow on resources, but forbid on python_sources). A global option won't work in this case, but adding a new field to a base target that deals with file ownership may be unjustified?

AlexTereshenkov commented 9 months ago

If we do proceed with a global option, I've done a bit of research, where I think we shall make changes:

Diff ```diff diff --git a/src/python/pants/engine/internals/graph.py b/src/python/pants/engine/internals/graph.py index 82287c1d93..b43afe0611 100644 --- a/src/python/pants/engine/internals/graph.py +++ b/src/python/pants/engine/internals/graph.py @@ -1066,6 +1066,9 @@ def extract_unmatched_build_file_globs( return UnmatchedBuildFileGlobs(global_options.unmatched_build_file_globs) +# TODO: create a rule to extract a global option to warn/fail/ignore when a recursive glob is found + + class AmbiguousCodegenImplementationsException(Exception): """Exception for when there are multiple codegen implementations and it is ambiguous which to use.""" diff --git a/src/python/pants/engine/target.py b/src/python/pants/engine/target.py index c4de1b598f..8c1f479651 100644 --- a/src/python/pants/engine/target.py +++ b/src/python/pants/engine/target.py @@ -2134,6 +2134,7 @@ class SourcesField(AsyncFieldMixin, Field): if not self.globs: return PathGlobs([]) + # TODO: check if self.globs have recursive globs and act accordingly # SingleSourceField has str as default type. default_globs = ( [self.default] if self.default and isinstance(self.default, str) else self.default diff --git a/src/python/pants/option/global_options.py b/src/python/pants/option/global_options.py index dc13ae09e9..c0c9d8e063 100644 --- a/src/python/pants/option/global_options.py +++ b/src/python/pants/option/global_options.py @@ -220,6 +220,8 @@ class UnmatchedCliGlobs(_GlobMatchErrorBehaviorOptionBase): class OwnersNotFoundBehavior(_GlobMatchErrorBehaviorOptionBase): """What to do when a file argument cannot be mapped to an owning target.""" +# TODO: create a class to represent behavior + @enum.unique class RemoteCacheWarningsBehavior(Enum): ```
benjyw commented 9 months ago

Hmmmmm, TBH I think we should be moving away from 1:1:1 recommendations and more towards getting rid of targets altogether as much as possible. The thing that would enable this is composition of target metadata up the filesystem. So there is no longer any concept of a target owning a source file. Instead the source file is the main concept, and zero, one or more targets just optionally add metadata to it (I'm talking about source targets like python_sources, binary targets like pex_binary are a different thing).

In this world you can attach metadata however you want - local globs, higher-level recursive globs, whatever makes sense in your repo, for the specific metadata in question, and 1:1:1 is no longer a useful recommendation. And of course in this world a source file can be referenced by zero source targets if there is no interesting metadata to add to it.

So I'm slightly leery of doing things that move us away from this world.

AlexTereshenkov commented 9 months ago

Oh okay, this is not something I have explored yet. Feel free to close the ticket!

For anyone who is not ready for this world and/or operates in the repo with plenty of targets, the suggestion would be to use some tooling capable of finding recursive globs patterns, either in automated or semi-automated manner.