Closed Eric-Arellano closed 2 years ago
(I'm not sure what the benefit of doing this would be?)
The benefit would be that partitioning logic works without having to know about all the axes it has to partition on.
Alternatively, you'd have to partition according to the cross-product of all parameters, which can have a cardinality much greater than the actual set of used configurations. Especially since in many cases there may be correlations between the parameters (e.g., the interpreter constraint for Python2 is strongly correlated with the resolve for Python2).
In the engine experiment, @jsirois had syntax for selecting configurations from dependees: address@key1=value1,key2=value2
. That could also be used here to uniquely identify one configuration of a target for the purposes of running multiple configurations in parallel: i.e.:
✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py2 succeeded.
✓ src/python/pants/backend/scala/goals/tailor_test.py:tests@interpreter=py3 succeeded.
Where the various configurations are actually defined is an open question for me though, and is very related to #13767.
Just throwing this out there....
Thinking what it could look like, taking inspiration from what pytest.mark.parametrize
does for test functions.
shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))
# above generates the equiv of =>
shunit2_test(name="tests#shell-1", source="tests.sh", shell="bash")
shunit2_test(name="tests#shell-2", source="tests.sh", shell="zsh")
That is, a new parametrize
construct, that turns any target into a target generator for the same, but with a certain field replaced for each instance.
And for the python_test
case it could look like:
python_test(name="tests", source="tests.py", interpreter_constraints=parametrize(["==2.7.*"], ["==3.6.*"]))
# ==>
python_test(name="tests#interpreter_constraints-1", source="tests.py", interpreter_constraints=["==2.7.*"])
python_test(name="tests#interpreter_constraints-2", source="tests.py", interpreter_constraints=["==3.6.*"])
Not sure how to support parametrizing on multiple fields, or if the target itself also already is a target generator, etc..
This matters less for monorepo, internal, company code which usually isn't published. But most published libraries are tested over multiple versions of Python (including multiple versions of Python 3). At my last company we wanted to test across all major supported py3 versions. So this would help simplify that.
However, v2 macros do solve the duplication issue and although macros are the Pants step-child, it raises no further issues other than "yuck macros" :wink: (and all that entails). Users get to control the name
which is nice.
I like @stuhood 's proposal of a different naming scheme for this as well as @kaos 's for parameterization. Although just to throw out this wild syntax:
@parameterize("shell", ["bash", "zsh"])
shunit2_test(name="tests", source="tests.sh")
Additionally, also throwing out there naming scheme could mirror pytest
's (using square brackets): rc/python/pants/backend/scala/goals/tailor_test.py:tests[interpreter=py2]
Would these parameters include implicit parameters like the execution and target platforms?
I've been playing around in my mind with the concept of a "build context" to make the platform explicit in build rules. An initial use case would be cross-compilation of Linux Go binaries from macOS for use in Docker images (which I saw on Slack a user had tried to do).
I'd love your feedback on how much a "build context" concept overlaps with the parameters idea in this issue.
A few thoughts:
BuildContext
type in the standard rules which is then a field in every rule type that does builds? BuildContext
at the very least would have execution_platform
and target_platform
fields.BuildContext
as a parameter if it differed from some outer BuildContext
? An example use case is a go_binary
that has an explicit (yet to be created) target_platform
field to set an explicit target platform to build for. (This could actually be a form of multi-type dispatch for Get
selectors. E.g., await Get(BuiltPackage, PackageFieldSet, some_field_set, build_context=BuildContext(...)
.)Is there a design doc for this issue?
I love @tdyas idea with a BuildContext
, to gather together relevant information about what to build for.
Currently there are disconnected fields and options hinting at what is required in terms of interpreter constraints (for Python) and platforms.
I think it could make sense to be able to provide these details as a form of "build input", where you can say: I want to to build for "IC pypy>=3.8 and platform x" and have that propagate to all considered targets, where applicable.
Akin to named resolves, this could be with named build contexts, so you can setup ahead of time the set of combinations you want to support, then selecting one or more of them to build for. Also, a target could specify compatible build contexts, so it doesn't build/use something that is known not to work.
Say, a docker_image
could perhaps provide something like a compatibile_build_platform="linux-x86-64-cp-38-cp38"
or compatbile_named_build_context="build-context-3.8"
. Then, when building the image, if no build context is explicitly selected, it could select one that is compatible with what is going to be required, for the upstream targets it depends on.
2. Could the engine support scoped overrides of singletons so that a rule could just call for the current
BuildContext
as a parameter if it differed from some outerBuildContext
?
If it will be overridden at some points in the graph, then it shouldn't be a singleton: rather, a Param
specified at the root of the graph and overridden in subgraphs. But yea: it's effectively equivalent, and that is what has been proposed for the Platform
: see the Multi-Platform Speculation design doc.
But to your first question: "no": I don't think that you need a BuildContext
"$deity object" that holds all of the context fields. The engine allows for injection of as many or as few parameters as are actually needed in some scope (by design, precisely for this case), and so if you use one less param then some other subgraph (say: you don't consume the Platform
), then you can have a smaller cache key and be platform independent, even though your callers weren't.
I think that the implementation sketch on Multi-Platform Speculation is still up to date, and I put a fair amount of thought into it at the time. Having multiple Params
that make up your "build context" in your subgraph requires one (hopefully small) engine feature, which would reduce/remove the need for multi-field Request
objects (or something like a BuildContext
): #7490.
So two things to consider here:
Params
which are then used or not used by callees). Global/named contexts probably won't make sense here.Another use for parameters: specifying which codegen backend to use: see (4) at https://github.com/pantsbuild/pants/issues/14041
Prior art which is similar to any sort of local parameterized
decorator or syntax: bazel's select
macro.
But very relevant is that parameterization or select
are local-only solutions (see "You can improve on this solution..." here, which motivates bazel's toolchains
): I'm not sure how they could easily be applied via #13767.
"Toolchains" to define the bulk of the arguments to a target seem closer to what we want, possibly with a local-only "multiplex across all of these toolchains" syntax (á la parameterization) exclusive to the named/addressed toolchain. But toolchains are quite a bit like a "$deity
object" (BuildContext
, as @tdyas described it)... exploding the toolchain out into a bunch of component Fields
while constructing the multiplexed targets could solve that...?
Here is a draft design for this feature: https://docs.google.com/document/d/1mzZWnXiE6OkgMH_Dm7WeA3ck9veusQmr_c6GWPb_tsQ/edit ... use a pants-devel@ member email for comment access.
Feedback welcome!
Some preliminary docs were added here: https://www.pantsbuild.org/v2.11/docs/targets#parametrizing-targets ... edits very welcome!
I'm out for a few days, but when I get back I'll add a list of open items here from #14408 that block closing this ticket.
Can't help but notice that my original idea would actually be valid now (cool 😁 )
shunit2_test(name="tests", source="tests.sh", shell=parametrize("bash", "zsh"))
A few issues encountered in https://github.com/pantsbuild/example-jvm/pull/12, which probably block resolving this:
deploy_jar
and a parametrized scala_sources
target) results in dependencies on all parametrized instances.parametrize
with a deploy_jar
, since the dependencies
field is explicit, and needs to be different per parametrization.Followup work would be:
resolve
need to be parametrized. "Configuration" would resolve this, but #12969
would also remove the need by templating the version into the artifact.Have updated the blockers in the above comment. We'll want to close this issue for 2.11.x
, since it's important for multiple-resolves.
2.11.x
: should be able to start it tomorrow. Let's switch to tracking that one.Thanks to everyone for the feedback and design assistance!
Status quo
It's useful to run against the same target but with slightly different config. For example, run the same
python_test
with Python 2 and Python 3 to validate it works with both. This feature is a major benefit of Tox, for example - we know it has value.In particular, it often makes sense to have permutations here:
interpreter_constraints
forpython_test
targets to check that it's valid for all interpreterscompatible_resolves
forjava_source
to check that each resolve compiles for the targetshell
forshunit2_test
to check that the script works with multiple Shells (Zsh, Bash, etc)To do that, you currently have to define two targets:
Then, you can tell Pants to either run
:tests_py2
,:tests_py3
, or use globs like a file spec to run over both.A major downside of this approach is duplication. Another major downside is it results in ambiguity with dependency inference because two targets export the same file.
First-class parametrization
Instead, @benjyw @stuhood and I discussed a syntax like this:
Which would result in running the same tests one time per
shell
. Or for interpreter constraints, one test run per "named" interpreter constraint (https://github.com/pantsbuild/pants/issues/12652).This would require reworking the Target API and Rules API interaction, which is currently very premised on one target -> one "build". For example,
FieldSet
is uniquely identified by its target'sAddress
and our genericcore/goals/test.py
is not capable of creating twoPytestFieldSet
s for the sameTarget
."Configurations"?
Each target type may have multiple parametrizable fields, e.g. for
java_test
:compatible_resolves
,scala_versions
,java_versions
. One idea is to have users create a single config that sets all of these values into a single addressable(?) entity. Then the target simply points to which config to use.(I'm not sure what the benefit of doing this would be?)