Open huonw opened 1 year ago
cc @chrisjrn who is already waist deep in shell command stuff.
Oh this is interesting.
If I were designing this, I could imagine being able to hook into existing non-destructive goals (like check
or lint
) where there's no confusion around ordering (since they're side-effect free so can be run concurrently), and potentially having a new goal for running custom destructive/side-effecting goals -- so you could run ./pants experimental-custom-thingy path/to/side_effecting:target
, which would avoid most of the questions that you raise.
I agree that hooking into fmt
etc would be best in the long run, but creating a custom experimental goal would make this achievable very quickly
For the short term, I feel that a good-enough approach to side-effects exists may already via experimental_run_shell_command
, run like ./pants run path/to/side_effecting:target
. For instance, in some experiments, I have a 'fmt'/'fix'-like target that consists of effectively (this is off the top of my head, so may not be 100% accurate, but I think the concept is about right):
# actually do all the interesting work to change files (cacheable, etc)
experimental_shell_command(
name="make_the_changes", outputs=["generated.txt"], command="...", ...
)
# write the side-effect back (i.e. replacement for fmt/fix)
experimental_run_shell_command(
name="write_the_changes", command="cp {chroot}/path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes"], ...
)
# test that things match (i.e. replacement for the lint part of a fmt/fix goal)
experimental_test_shell_command(
name="check_the_changes", command="diff path/to/generated.txt path/to/whatever.txt", dependencies=[":make_the_changes", "./whatever.txt"], ...
)
That said, this isn't perfect, because every run
seems to have happen as its own ./pants
invocation? And there doesn't seem to be a way to orchestrate around this from within pants itself:
local = "tailor update-build-files fmt lint check ::"
for the quick local checks, and it'd be nice to be able to run the side-effecting shell commands as part of this in an integrated wayrun
to it fails, e.g. running ./pants fmt :: run path/to:write_the_changes
seems to be equivalent to ./pants fmt run ::
and that understandably fails (TooManyTargetsException: The `run` goal only works with one valid target, but was given multiple valid targets:
)./pants
Running multiple targets would definitely be an interesting use case, and you're right, we don't have that right now. 🤔
After some discussion, and given we now have test
support through experimental_test_shell_command
, it probably makes sense to support check
with similar behaviour; this would cover all use cases that would be handled by check
or lint
(there's not really a good semantic difference between check
and lint
imho).
Tools with side-effects are still interesting. I'm less happy with the idea of hacking something together with run
now, since we lose sandboxing, and with it the ability to discard or roll-back the results of the goal: this is important since it means you can run fmt
as a dry-run at CI time and report on whether the commit is fmt
-clean entirely from within Pants. That's much harder if you're using run
and not sandboxing things.
I think we'd want to implement mutating behaviour with a different goal, because the semantics are different to fmt
: if we have a target (extremely strawman name) that defines a mutating command
experimental_mutating_command(
name="mutator",
targets=["//src/js/package_to_format:js_sources",],
)
Then the target spec to call mutator
would be ./pants fmt path/to/mutator:mutator
, however, src/js/package_to_format/*.js
would be the files that are formatted. i.e. the goal would mutate targets that are not the ones specified in the target spec. This is significantly different to fmt
's behaviour, and I think that warrants a separate goal with separate documentation.
Finally, having a separate goal would make it possible to string together multiple experimental_mutating_command
targets at the command line (or have them caught by ::
).
Thoughts?
Makes sense for check (and lint)!
I'm not sure I entirely understand the proposal for fmt
. Would it be something like ./pants goal-name --arg-name=path/to/mutator:mutator1 --arg-name=path/to/mutator:mutator1 ::
(placeholder names to avoid getting distracted by bikeshedding 😅 ) to run mutator1
and mutator2
on all targets? If so, that seems pretty reasonable!
Questions (in decreasing order of priority, discussion of the priority in parens):
./pants fmt goal-name --arg-name=path/to/mutator:mutator1 --arg-name=path/to/mutator:mutator2 ::
would be equivalent to ./pants fmt ::
then mutator1
then mutator2
? (Having the ability to do this would alleviate a lot of the downsides of being a separate goal to fmt
, so that we can set up a pants alias for ease of use locally.)targets
might require actual target specifications, which means it is effectively an allowlist? Would there be a way to run on any *.yml
or *.py
file anywhere in the tree? (Without this, I'd be nervous about adopting this feature, as we'd almost certainly forget to keep the allowlist up to date as code is added/moved.)./pants lint ::
currently runs all the linters and formatters (in dry-run mode) in parallel, would the dry run mode of the command from 1 do something similar, or would it be more sequential (e.g. run linters and then these new style)? (Just an optimisation, not critical.)A probably-wild idea (more along the lines of your "in the long run" reference above, rather than a first step) would be getting these into backend_packages
, e.g. something like the following, where pants automatically understands that it needs to resolve the mutator1
"backend" as a target within the repo:
backend_packages = [
"pants.backend.python",
"pants.backend.python.lint.black",
"//path/to/mutator:mutator1", # runs after black, but before isort
"pants.backend.python.lint.isort",
]
That seems like it may require some careful circular dependency resolution: pants needs the backends in order to resolve the targets, and needs to resolve targets in order to load the backends. 😅
(This potentially generalises how pythonpath = ["%(buildroot)s/pants-plugins"]
can be used to load pants-plugins/path/to/plugin
as a path.to.plugin
backend: it can instead be just listed as a //pants-plugins/path/to/plugin:plugin
backend directly.)
Without really re-thinking things, there would be no way to specify which targets the mutator is to act upon, except within the target definition itself.
If you need more info, I'll dump that when I'm back at my desk.
(The crux of it is that ::
would supply literally every file type to the custom mutator. The way to make ::
behave sanely is to make a plugin with a custom source type, at which point it's just as easy to make a fmt
plugin.)
Yeah fmt
/fix
seems a bit more interesting, but I'd really love if we saw the reporting goals (lint/check/etc...) in an experimental target. I have some "lints" I really want to do in-repo but plumbing a full plugin is overkill (e.g. "make sure all test files use the _test.py
suffix, and not test_
). We can then come back and figure out fix/fmt later.
I suspect this feature would take off like hotcakes. In fact ignoring the complexity of ICs and whatnot, imagine flake8
implemented as a pex_binary
and experimental_lint_command
(with a dependency
on the relevant configs).
:exploding_head:
@huonw I have been playing around with possibilities relating to the scoped problem of:
cfn-lint
or mdformat
the idea is there is someone trying to convert to pants and they currently are running, say mdformat
from the command line, and they'd like to replicate that behavior on pants without having to understand enough about pants to adapt https://www.pantsbuild.org/docs/plugins-lint-goal to their needs and create a custom plugin.
The demo repo in https://github.com/gauthamnair/example-pants-byo-tool shows an API to "Bring Your Own Tool". The user makes a tiny plugin with this configuration:
confs = [
ByoLinter(
options_scope='byo_shellcheck',
name="Shellcheck",
help="A shell linter based on your installed shellcheck",
command="shellcheck",
file_extensions=[".sh"],
),
ByoLinter(
options_scope='byo_markdownlint',
name="MarkdownLint",
help="A markdown linter based on your installed markdown lint.",
command="markdownlint",
file_extensions=[".md"],
)
]
And the linters become available:
$ pants lint ::
...
✓ byo_markdownlint succeeded.
✓ byo_shellcheck succeeded.
✓ flake8 succeeded.
The underlying library is dynamically creating the different types and rules required for a proper implementation of LintRequest based on the user's configuration. (the lib is at https://github.com/pantsbuild/pants/pull/19954)
It is a crude demo and much would be needed to be done to button it up, too many things to list here even, but the hypotheses driving the approach is:
markdownlint docs/
. If there is a lot of incidental complexity in activating a new tool for a goal, then we can add more tooling, but keep the responsibility on the pants/plugin/pants.toml side of the fence.
The demo adds the capability as an abbreviated plugin, but it could be configured entirely within pants.toml
by creating some new bootstrap option and threading them to load_backend
during extension loading. I plan to play with that to see what could happen there.
Having it in pants.toml would reduce the barrier to entry to probably around what a new user would find welcoming.
With such a plugin-generation strategy, we should be able to handle fix
and fmt
without much difficulty. We would implement one for some custom tool, look at the part that looks like boilerplate, and abstract it away.
The demo in example-pants-byo-tool is fleshed out now more and includes the following kind of API:
The user defines an in-repo plugin with a register.py containing:
confs = [
ByoTool(
goal="lint",
options_scope='byo_markdownlint',
name="MarkdownLint",
help="A markdown linter based on your installed markdown lint.",
executable=SystemBinaryExecutable("markdownlint", tools=["node"]),
file_glob_include=["**/*.md"],
file_glob_exclude=["README.md"],
),
ByoTool(
goal="lint",
options_scope='byo_flake8',
name="byo_Flake8",
help="Flake8 linter using the flake8 you have specified in a resolve",
executable=PythonToolExecutable(
main=ConsoleScript("flake8"),
requirements=["flake8>=5.0.4,<7"],
resolve="byo_flake8",
),
file_glob_include=["**/*.py"],
file_glob_exclude=["pants-plugins/**"],
),
ByoTool(
goal="fmt",
options_scope='byo_black',
name="byo_Black",
help="Black formatter using the black you have specified in a resolve",
executable=PythonToolExecutable(
main=ConsoleScript("black"),
requirements=[
"black>=22.6.0,<24",
'typing-extensions>=3.10.0.0; python_version < "3.10"',
],
resolve="byo_black",
),
file_glob_include=["**/*.py"],
file_glob_exclude=["pants-plugins/**"],
)
]
And markdownlint, flake8 and black become activated for use. Properties:
PythonTool
uses. But requires the user to make the dependencies available in a resolve.skip
and only
work as usual to select which tools you want to run. Help appears.It will be straightforward to add the following features as well:
ConfigFilesRequest
we usually use.It is aspirational to be able to define this configuration within pants.toml
or some other config file so the user does not even have to know about the involved python types.
Under the hood we take the user-provided configuration and generate all the types and rules needed to wire up a plugin. It uses the regular mechanisms to do so, and involves no change to the core. It is basically code-generation.
The implementation in https://github.com/pantsbuild/pants/pull/19954 is a proof-of-concept.
For reference, there is a slack thread in which we've been discussing possibilities: https://pantsbuild.slack.com/archives/C0D7TNJHL/p1696006330169339
There is also a google slides doc (should be open for read to anyone with the link) incorporating the discussion in https://github.com/pantsbuild/pants/pull/16412 which tackles the same issue of making linters easier to express.
As a small but concrete example, we have several pre-commit
hooks: https://pypi.org/project/pre-commit-hooks/ Maybe of them are small single file things like check-shebang-scripts-are-executable
or trailing-whitespace
. We also have some similarly sized custom hooks. I'd like to run them through pants:
Pants can deal with getting the python executable to run them.
But since they are all so small it would be absurd to make a plugin for each one.
We could define runnable targets as usual in BUILD files:
# build-support/BUILD
python_requirement(name="black_req", requirements=["black==23.1.0"])
# a runnable
pex_binary(
name="black_bin",
script="black",
dependencies=[":black_req"],
)
These are runnables in the same sense as those compatible with adhoc_tool.
Then we add configuration to pants.toml
that refers back to this runnable and instructs how to use it for lint/fmt/fix.
# pants.toml
[GLOBAL]
backend_packages.add = ["...", "pants.backends.byotool"]
[byotool]
conf = """{
'byo_black': {
'goal': "fmt",
'runnable': 'build-support:black_bin' # any runnable supported by adhoc_tool
'name': "Black",
'help': "Black formatter …",
'file_glob_include': ["**/*.py"],
'file_glob_exclude': ["pants-plugins/**"],
'config_files': {
'discover': {
'globs': "**/pyproject.toml"
}
}
}
}"""
The user would then use byo_black
in the same way as they currently use the black
backend linter.
pants.toml
is still responsible for defining what lint
means, repo-wide, and for driving the options
system, but at run-time it looks up a target to build the relevant executable.
adhoc_tool
established a means to generically run "runnable" targets in a sandbox. Those definitions and the mechanism should be reusable here.
There is precedent for referring to targets from pants.toml. For example to specify in-repo first-party plugins for flake8.
One big problem with the ByoTool proposal is that it introduces yet another way to express a runnable in pants. Currently we have two methods:
In plugin code by subclassing PythonToolBase
, JvmToolBase
, SystemBinary
, TemplatedExternalTool
etc... These data structures are quite complex and involve concerns like the options system. They are meant to be used by pants contributors and power users
In target definitions. These are things like pex_binary
, python_source
, system_binary
, jvm_artifact
. They are intended for use by the developers whose code lives in a pants-enabled repo.
Many of the above are logically similar. For example the pex_binary
target for a third party lib can accept much of the same type of data as a PythonToolBase
subsystem.
The original ByoTool proposal introduces a 3rd way: a simplified data-only (no options) specification of executables for developers to use but on the plugin side. It would be much better to lean on the existing ability to specify runnables in targets.
In the proposal, is there a reason the conf = """{
is a string as opposed to the config expanding out as "regular toml"?
I think my ideal world these definitions would be python-y things like BUILD files, but I'm not sure if there is prior art for things other than targets being in BUILD-like files and I share the hesitation to rope this -- hot cakes! -- proposal to nebulously larger changes.
Here is a working prototype that should be good for testing usability:
in the BUILD files define runnable targets and then define a new code_quality_tool
target that references it and adds other things needed to use it.
file(name="flake8_conf", source=".flake8")
python_requirement(
name="flake8",
requirements=["flake8==5.0.4"],
resolve="byo_flake8"
)
code_quality_tool(
name="flake8_linter",
runnable=":flake8", # a runnable in the sense of adhoc_tool
runnable_dependencies=[],
execution_dependencies=[":flake8_conf"],
args=["--max-line-length", "100"],
file_glob_include=["**/*.py"],
file_glob_exclude=["pants-plugins/**"],
)
We refer to config files with regular targets in the same style as adhoc_tool
Here's an example of using a system binary that itself depends on node:
system_binary(
name="node",
binary_name="node",
fingerprint=r'.*',
fingerprint_args=["--version"],
)
system_binary(
name="markdownlint",
binary_name="markdownlint",
fingerprint=r'.*',
fingerprint_args=['--version'],
fingerprint_dependencies=[':node']
)
code_quality_tool(
name="markdownlint_linter",
runnable=':markdownlint',
runnable_dependencies=[":node"],
file_glob_include=["**/*.md"],
file_glob_exclude=["README.md"],
)
The code quality tools have to be registered with a goal, and have their options scopes named. For fmt/fix, they also need to be inserted in the correct order.
[GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')",
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:markdownlint_linter', scope='markdownlint', name='Markdown Lint')",
"pants.backend.experimental.code_quality_tool(goal='fmt', target='//:black_formatter', scope='black_tool', name='Black')",
]
The proposal extends the backend loading mechanism to be able to use the same module to define several sets of rules based on inputs.
No further configuration is required.
Outwardly the installed code quality tools behave very similarly to regular ones. See https://github.com/gauthamnair/example-pants-byo-tool/tree/main#readme for what the command line input/output look like.
I'm absolutely loving it so far. This is honestly better than I'd imagined.
Only nit I have:
The proposal extends the backend loading mechanism to be able to use the same module to define several sets of rules based on inputs.
Changing backend_packages.add
certainly restarts the daemon. So we likely would want to put this in an option that wouldn't restart the daemon. If ordering is important (it really only is for fix/fmt, in which case ideally ordering shouldnt matter because that's a red flag) we could probably figure out a different strawman.
So I think it's best to just assume ordering doesn't matter between backend_packages
and code_quality_tool
s.
I'll also echo that this is exciting!
file_glob_include=["**/*.py"],
What's the intuition I should have for how this interacts with caching? If one .py
file changes, is the tool run over just that file or the whole globs?
What's the intuition I should have for how this interacts with caching? If one .py file changes, is the tool run over just that file or the whole globs?
The correct thing (and I suspect the behavior as implemented in the branch) is the same as if you edited a file with any of the existing plugins: it invalidates the batch that contains the file. So you re-run the batch.
@cburroughs The file_glob_include
only describes what files are eligible to be the subject of the code quality tool. The actual runs are done by the usual batching mechanism. You can also subselect what you run on in the same way as with existing linters and formatters. Caching should work on the batches.
@thejcannon I am a bit puzzled about the daemon-restarting bit. Normally, if we add a new linter or formatter we have to restart the daemon because it introduces all kinds of new rules, etc. We add to pants.toml and that restarts the daemon.
In the proposal above, I think we are in no worse of a situation. Adding a new linter is a one-time change to the pants.toml and will result in one restart of the daemon, which seems necessary to install the new underlying types and rules.
Changing things about the configuration of the code quality tool, like which config files it depends on, or which args should always be passed, does not require restarting the daemon since that is all in the target.
@cburroughs about your earlier comment, it did turn out to be possible to move more stuff into the BUILD
file, leaving the bare minimum in pants.toml
to define the options subsystem and rule-graph rules for each tool. These things have to be done during bootstrapping and to my understanding cannot involve querying for targets and their contents.
Fortunately there was a way to dodge the ugly conf = """{
that you were wary about.
@thejcannon I am a bit puzzled about the daemon-restarting bit. Normally, if we add a new linter or formatter we have to restart the daemon because it introduces all kinds of new rules, etc. We add to pants.toml and that restarts the daemon.
Fair enough.
So, the other awkwardness with this approach is that it would be essentially impossible to have some "higher" config remove this backend (e.g. backend_packages.remove =
) because their string is... very involved. So another reason we might want to shuffle those values into its own (like Dict) option.
+1 definitely interested in other ways to do it. There were some mechanical conveniences in doing it in backend_packages
only, but its when it also seemed conceptually sound that it became attractive to introduce the unusual syntax:
pants.backend.experimental.code_quality_tool
is a kind of backend code-generator. It stamps out what one normally uses as backends as long as it is given some templated arguments. Taking the analogy for what we would do in regular programming, we could name these in a preliminary step, and then load those as backend packages.
Something like
[GLOBAL]
generated_backend_packages = {
flake8_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')"
markdown_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:markdownlint_linter', scope='markdownlint', name='Markdown Lint')"
black_formatter = "pants.backend.experimental.code_quality_tool(goal='fmt', target='//:black_formatter', scope='black_tool', name='Black')"
}
backend_packages.add = [
"pants.backend.python",
"flake8_linter",
"markdown_linter",
"black_formatter",
]
Thank you for all the experimentation! So good!
Just brainstorming: what if the BUILD
target incorporated things like which goal and the name, and the pants.toml
line was just the address path/to:black
? With pants understanding "if it has a colon don't try to import it as Python syntax", and instead does the magic to read it from a BUILD file and deduce the goal etc from that. This would simplify the text in backend packages to allow easier addition/removal.
@huonw I am worried about
magic to read it from a BUILD file
From what I can tell, there is no precedent for reading into BUILD files, interpreting targets, etc. during options bootstrapping. For example, one usually has to load all backends to discover all target_types and build aliases before our usual method to parse BUILD files can work. Similarly, is there precedent for the rule graph's structure to depend on particular target definitions in BUILD files?
There is precedent for subsystems having references to targets that will be resolved at runtime.
Maybe another angle on this matter:
LintResult
or a FmtResult
? and what is the return type?). yea, as @gauthamnair says, there's no interaction with BUILD files during options bootstrapping and loading backends. I think we'd need to simplify the bootstrapping process before considering going into a two-phase bootstrapping scenario where we can consume BUILD file targets for bootstrapping purposes. Environments does this already (the two-phase thing) but not for bootstrapping options, but, environments :)
Makes sense, thanks!
Friends, the positive reception is suggesting we keep pushing. I am going to split the wip PR into two different PRs:
Introduce library-level functionality for CodeQualityTool. After this PR, it will be possible for a user to define code_quality_tool
targets and write a very thin in-repo plugin to activate them via the lib. This is the core functionality of the proposal.
A later PR will be about how one could use such functionality without having to write an in-repo plugin but through pants.toml
alone. This aspect is where most of the discussion in the last few messages has been because it will involve either new pants.toml
syntax or possibly some change in the bootstrapping mechanism.
@gauthamnair any reason this is "scoped" to code quality, rather than just a generic any kind of tool?
@kaos, code quality was some generic term that could encompass lint, fix, and fmt. I could not think of a better one (actually it was chat GPT's idea, no kidding). The idea is there is a target that defines a tool that can be used for one of these purposes. The tool cannot 'deploy' or 'publish' or anything like that.
Thanks for digging in on this @gauthamnair . Having higher level "porcelain" like this to simplify common plugin tasks is a great idea. Starting with fix/fmt/lint makes sense to me for scope. We can later add more porcelain, such as for code generation, or deploy.
https://github.com/pantsbuild/pants/pull/20135 has a CI-passing version of the library part of this functionality. Please take a look.
https://github.com/pantsbuild/pants/pull/20135 Added the core machinery but only exposes it to users by writing a small in-repo plugin. Code quality tool is not yet announced and we are pondering if we want an "easier" way before announcing it.
Here are several proposals summarized from the rest of this thread. As an example we'll add flake8 ourselves rather than via the pants plugin.
This is already supported and a reference for all other proposals
BUILD:
python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
)
pants.toml
[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# code_quality_tool's target and rules are defined in the adhoc backend
"pants.backend.experimental.adhoc",
# thin in-repo plugin activates a particular target as a linter
"flake8_linter_plugin",
]
and an in-repo plugin pants-plugins/flake8_linter_plugin/register.py
from pants.backend.adhoc.code_quality_tool import CodeQualityToolRuleBuilder
def rules():
cfg = CodeQualityToolRuleBuilder(
goal="lint", target="//:flake8_tool", name="Flake8", scope="flake8_tool"
)
return cfg.rules()
This is already supported and does not require introducing any new concepts, bits of syntax or overloading of syntax
https://github.com/pantsbuild/pants/issues/17729#issuecomment-1783449267
This is the first of several proposals that involve introducing new semantics to backend_packages
. If the target had a bit more info, it seems we could refer to the target directly and the subsystem loader could be adjusted to activate it.
BUILD:
python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
# These next two fields used to be in the in-repo plugin
scope="flake8_tool",
subsystem_name="Flake8",
goal="lint",
)
pants.toml
[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# needed to activate code_quality_tool
"pants.backend.experimental.adhoc",
# reference to the code_quality_tool
"//:flake8_tool",
]
The ergonomics or ease seems very clean on this. The complication is reshaping the boundary between the role of targets and the role of subsystems and bootstrapping in a pants project both in concepts and in implementation. As Andreas points out this was done in a way for environments, but loading targets during bootstrap has gotchas
https://github.com/pantsbuild/pants/issues/17729#issuecomment-1781916499
Proposes that a backend can be templated and stamped out by supplying the template arguments all within the pants.toml
. A templating backend is like a regular backend except it takes keyword arguments in the entrypoints in register.py
.
BUILD (same as base case):
python_requirement(
name="flake8",
requirements=["flake8==5.0.4"]
)
code_quality_tool(
name="flake8_tool",
runnable=":flake8",
execution_dependencies=[":flake8_conf"],
file_glob_include=["**/*.py"],
file_glob_exclude=["dont_lint_me/**"],
args=["--indent-size=2"],
)
pants.toml:
[GLOBAL]
pythonpath = ["%(buildroot)s/pants-plugins"]
backend_packages.add = [
"pants.backend.python",
# code_quality_tool's target and rules are defined in the adhoc backend
"pants.backend.experimental.adhoc",
# stamps out a backend that uses the particular target as a linter
"pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_tool', scope='flake8', name='Flake8')",
]
There could be other forms of syntax. The particular variant above has a POC implementation here . A pro of this approach is it maintains the separation of concerns between options/bootstrapping and target-based configuration, and as such has a straightforward implementation.
A con is the introduction of templating as a concept, and the oddness of introducing something like a function call all within ""
in what is otherwise a list of packages. Josh has also raised the concern that these complicated strings may be difficult to remove when a backend_packages.remove
is used.
A variant of the proposal is to have separate section where stamping out of such templated backends gets done and then refer to those in backend_packages:
[GLOBAL]
generated_backend_packages = {
flake8_linter = "pants.backend.experimental.code_quality_tool(goal='lint', target='//:flake8_linter', scope='flake8', name='Flake8')"
}
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
"flake8_linter",
]
Of these I like the templated alternative the most, with a slight adjustment.
Instead of using a single string value for the backend and it's arguments, I'd consider exploading it onto separate option values:
[GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# using a static prefix to avoid conflict with any other plugin names now and into the future.
"generated:flake8_linter",
]
[GLOBAL.generated_backend_packages.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"
Although the current options system don't have good support for this deeply nested option structures, it does allow arbitrary values, so the goal
, target
, scope
etc are "free form" and backend specific so no validation done on them by pants itself. They are merely passed on as kwargs to the specific backend. Only the backend_package
is required and checked by pants and excluded from the kwargs used for the backend init call.
Looking at this, trying to put on the user goggles the generated_backend_packages
seems a bit awkward and "implementation detail leak-y".. how about just calling it "backend"?:
[GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# using a static prefix to avoid conflict with any other plugin names now and into the future.
"backend:flake8_linter",
]
[GLOBAL.backend.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"
This is a nice idea @kaos , riffing on it further. There is a backend_package
and there are also backends
. Previously they have been 1-1. Every backend_package
was a python package satisfying the register.py
with no-arg entrypoints protocol.
Now we still have python packages with relevant code but some packages do not automatically generate a backend because they need further parameters. They can also generate several backends.
So we could have something like this:
[GLOBAL]
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
# maybe this line below is not needed
"pants.backend.experimental.code_quality_tool_template",
]
[GLOBAL.backends.flake8_linter]
backend_package = "pants.backend.experimental.code_quality_tool_template"
goal = "lint"
target = "//:flake8_linter"
scope = "flake8"
name = "Flake8"
New linters would not add new lines to backend_packages
, and backend_packages
would still be packages. There are several options for how the system would know code_quality_tool_template
is a templating backend package rather than the usual kind. We could either add new methods to the register.py
protocol or we could add a new entry point like register_templated.py
.
Ah, yes. I think we could leave templated backends out of backend_packages
altogether, and simply have a enabled = <bool>
option instead, in the GLOBAL.backends.<name>
section.
The separate [GLOBAL.backend]
sections seems very nice, nice idea @kaos!
I believe order of backends can matter (e.g. if a formatters/fixers need to be invoked in a particular order), so I wonder if omitting them completely from the backend_packages
list isn't going to work?
Huon, you are right. order matters, it has to be referenced. I'll take a crack at seeing what the implementation could look like.
I have to confess that I'm starting to hear the voice saying "When in doubt, leave it out" about trying to do something more ergonomic than the thin in-repo plugin. Questioning whether it is really a blocker before go-to-market for code_quality_tool
.
Hope to report back with some kind of POC impl of the [GLOBAL.backend]
based approach soon.
@kaos @huonw Here is a POC impl. https://github.com/pantsbuild/pants/pull/20270 . I am really liking it actually. I thought I would be more afraid of it when it is alive, but turns out it is comforting instead. Here is the toml from the example repo that shows it in action:
[GLOBAL]
pants_version = "2.17.0"
backend_packages.add = [
"pants.backend.python",
"pants.backend.experimental.adhoc",
"flake8",
"markdownlint",
"blackfmt",
]
[GLOBAL.templated_backends.flake8]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:flake8_linter"
name = "Flake8"
[GLOBAL.templated_backends.markdownlint]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "lint"
target = "//:markdownlint_linter"
name = "Markdown Lint"
[GLOBAL.templated_backends.blackfmt]
template = "pants.backend.experimental.adhoc.code_quality_tool_backend_template"
goal = "fmt"
target = "//:black_formatter"
name = "Black"
I believe order of backends can matter (e.g. if a formatters/fixers need to be invoked in a particular order), so I wonder if omitting them completely from the
backend_packages
list isn't going to work?
I wish us to rid this constraint. That is, aim for having the backends provide the needed information to resolve order instead of relying on the user provided configuration is in proper order. This is a sneaky gotcha.
Isn't most solved by simply applying fixers first then formatters and lastly linters? If order of fixers/formatters matter that would be awkward but hopefully we could find a solution to that as well if needed.
@gauthamnair really nice!
For me, I'd prefer not to surface the "templating" part too much, keeping the UX for what I feel would be cleaner, I'll leave some notes for that on your draft PR.
Also, I like the way the POC interacts with backend_packages
so the extra sections are merely a way to provide kwargs
dynamically to any backend.
Is your feature request related to a problem? Please describe.
Shell is a universal glue, that allows augmenting pants adhoc with a lower barrier to entry (and, for simple tasks, lower maintenance burden) than writing a custom plugin. It'd be nifty to generalise the current codegen
experimental_shell_command
, runexperimental_run_shell_command
and recently testexperimental_test_shell_command
(https://github.com/pantsbuild/pants/pull/17640) to allow using a shell script for 'anything'.For example, in a polyglot repo, pants may support some of the languages/tools natively, but not others, but it'd still be nice to have basic commands like
./pants lint ::
and./pants fmt ::
work across the whole system. Additionally, it would allow faster experiments and migration of an existing repo into pants, if glue scripts (and/or makefiles, or whatever) can be reused, and switch to having./pants
to more.(We 'tripped' over both of these in our migration into pants, but I personally have liked pants enough to suffer through split tooling, and convince the rest of the team too, as well 😅 )
Describe the solution you'd like
Something way to have a shell command hook into other goals. For instance:
dependencies
would be the input files (for file-based commands).outputs
field could be reused forfmt
andfix
to write back to the codebase.Some questions that seem relevant (although some can probably be deferred/considered potential future enhancements?):
fmt
/fix
subsystems (e.g. ifmy-formatter
touches python files, does it run before or after black)?fmt
andfix
? How does that work?a.txt
and another outputsb.txt
, the final output has both)? What happens if two invocations generate the same path (with different or identical content)?**/*.py
(to be able to run a custom linter on all Python files in descendent directories, without writing out the path to their individual targets)?Describe alternatives you've considered
We are:
Additional context N/A