pantsbuild / pants

The Pants Build System
https://www.pantsbuild.org
Apache License 2.0
3.32k stars 636 forks source link

Resolves should automatically include transitive targets, and not restrict 1st party source code #21194

Open jasondamour opened 3 months ago

jasondamour commented 3 months ago

Problem

Migrating a monorepo with libraries and services with conflicting version constraints to pants requires so much effort compared to the intuitive pattern in poetry/pip.

For example, imagine the following repository:

flowchart LR
    X(3rd-party-lib)
    subgraph Libraries/
        libA
        libB
    end
    subgraph Services/
        service1
        service2
    end
    libA -.-> libB
    libB -.->|>=1| X
    service1 --> libA
    service2 --> libB
    service2 -.->|==2| X

Sample repo: https://github.com/jasondamour/pants-vs-poetry

This repository is valid in poetry. We can run poetry lock for each service:

_All the same concepts should apply to pythonrequirements from requirements.txt and pep-621 compliant pyproject.toml files, I'm just using poetry as an example

For the equivalent in Pants, I need to:

  1. In pants.toml, define a resolve per service and library
  2. Parametrize targets to add ALL libraries to ALL resolves a. Pants struggles to infer dependencies when resolves are enabled, so it also requires a lot of manual dependencies

Solution

Resolves should (optionally) behave more like poetry per-module Virtual Environments.

For example:

  1. Define resolves:
    # pants.toml
    [python.resolves]
    service1 = "Services/service1/pants.lock"
    service2 = "Services/service2/pants.lock"
  2. Set Services/service1/BUILD: poetry_requirements(resolve="service1") to add direct targets of this directory to @resolve=service1 (and repeat for service2)
  3. Declare a dependency (same as vanilla poetry) services/service1/pyproject.toml: libA = {path = "../../Libraries/libA", develop = true}
  4. pants generate-lockfile --resolve=service1 should create a lockfile which pins libA, libB, and 3rd-party-lib
  5. pants test Libraries/libA:: should run without a resolve or lockfile

Describe alternatives you've considered

We may just go back to directly using poetry for all dependencies, and exporting lockfiles for services from poetry for pants to consume in tests. I think that will give us exactly what I'm describing.

tdyas commented 3 months ago

How does Poetry behave if you use poetry lock to generate a poetry.lock lockfile and then try to use two different major versions of a dependency?

jasondamour commented 3 months ago

How does Poetry behave if you use poetry lock to generate a poetry.lock lockfile and then try to use two different major versions of a dependency?

Can you clarify the scenario you're asking about?

jasondamour commented 3 months ago

I've created a sample repo here: https://github.com/jasondamour/pants-vs-poetry

benjyw commented 3 months ago

So in the Poetry equivalent, which lockfile is used when you run tests on, say, LibA?

benjyw commented 3 months ago

When, say, packaging the services it's clear to me what you expect. I'm less sure about what the expected behavior is when you act on the libraries.

benjyw commented 3 months ago

For context, this is very helpful for https://github.com/pantsbuild/pants/discussions/20897

jasondamour commented 3 months ago

So in the Poetry equivalent, which lockfile is used when you run tests on, say, LibA?

It depends what project/directory you're acting from:

➜  poetry run --directory Libraries/libB python Libraries/libB/libb/main.py
flask version: 3.0.3

➜  poetry run --directory Services/service2/ python Libraries/libB/libb/main.py
flask version: 2.3.3

In pants-speak, I guess I would describe this as: using the resolve from the directly invoked target (top of the stacktrace), rather than the resolve of the transitive target?

benjyw commented 3 months ago

I see. And if there are multiple directly invoked targets?

$ pants test Libraries/::

for example.

benjyw commented 3 months ago

In the new python backend I plan to "partition by config", so if there were separate resolves it would be obvious which one pertains.

jasondamour commented 3 months ago

IMO the above statement should expand to something like this:

$ pants test :: # this "expands" to:
pants test Libraries/libA/liba/main_test.py # this test (and all transitive deps) run with @resolve=libA
pants test Libraries/libB/libb/main_test.py # this test (and all transitive deps) run with @resolve=libB
pants test Services/service1/service1/main_test.py # this test (and all transitive deps) run with @resolve=service1
pants test Services/service2/service2/main_test.py # this test (and all transitive deps) run with @resolve=service2

but that would also require all python_sources to be added to all resolves, or resolves shouldn't include sources

In poetry's case specifically, it also feels like a resolve should be 1:1 with source roots. Where all python_requirements within a given source root are added to the same resolve

benjyw commented 3 months ago

Yeah, that makes sense and is roughly along the lines I was thinking of for the new backend. I need to think about whether there is some way to emulate this in the existing backend, I'll play around in your repo a bit.

jasondamour commented 3 months ago

Sounds good, I really appreciate it.

benjyw commented 3 months ago

Hmm, I haven't come up with much. Maybe set enable_resolves = false when running tests, and letting Pants yolo it without a lockfile, and then use the service lockfiles when packaging? It's pretty lame, but you might get some mileage.

TBH this is not a use case that the original backend considered. It was focused more on keeping requirements coherent across a large "JVM-style" tightly-bound monorepo, rather than a collection of standalone projects that are loosely bound (e.g., consume each other as requirements at runtime) and happen to be in the same repo.

I am working on the new backend precisely to address this looser case, to make Pants less nannyish and let you do what you want rather than trying to keep you away from sharp edges.

Sorry I don't have better news except "wait for the new backend"... :(

Again, you may be able to do something by having multiple config files and switching between them depending on what you're doing. Lame, but might tide you over.

benjyw commented 3 months ago

Meanwhile your feedback/participation in https://github.com/pantsbuild/pants/discussions/20897 would be really useful. Linking to this issue and the repo from there would provide a real-world use case to kick the tires on.

jasondamour commented 3 months ago

A new python backend (or pants v3) would be awesome, but is there any option achievable with the current backend? If there was an option where resolves would not be applied to source targets (any source could be used by any target), then I think that would do the job for us. Then "lockfiles" would actually work how they do in every other dependency management tool.

I'm willing to look into contributing this, if you could give some pointers

jasondamour commented 3 months ago

@benjyw Based on this comment, it seems like there might be viable path forward. Pants could completely support/document the concept of None resolve, which is a special resolve (named None or something else like *), which does not generate a lockfile and can be used by any other resolve

I ran pants from source with the default resolve set to None directly in setup.py (to workaround the issues of declaring None in toml), and pants still treated None like a string in a lot of places. I'll keep playing with this

jasondamour commented 3 months ago

Hm, "nullable resolves" alone wouldn't address the issue of inferring a different import owner based on which target is directly invoked.

So to get something working in the current python backend, one approach could be:

  1. A new [python-infer].ambiguity_resolution called by_resolve a. This looks for a dependency defined in the same resolve as the top-level target, then falls back onto by_source_root b. This might have very poor memoization performance, as the graph could be different for each top-level target
  2. [python].default_resolve defaults to None a. [python].enable_resolves defaults to true (could even be deprecated+removed, since the new behavior of defaulting targets to the 'None' resolve is equivalent to running without resolves)

Here's what this would look like in my sample repo:

# pants.toml
[python-infer]
ambiguity_resolution = "by_resolve"

[python.resolves]
service1 = "Services/service1/pants.lock"
service2 = "Services/service2/pants.lock"

Then all targets in Libraries/libA and Libraries/libB default to resolve=None. In each Services/service1/BUILD and Services/service2/BUILD, set the respective resolve, i.e.:

_defaults__(all=dict(resolve="service1))

Then, for pants test Services:: we should get the resulting dependency graphs:

  1. First, tests in Service1 where Flask is not defined in the service1 resolve
    Services/service1/service1/main_test.py@resolve=service1 # This is the "top-level resolve" for this graph
    -> Services/service1/service1/main.py@resolve=service1  
    -> Libraries/libB/libb/main.py[@resolve=None] 
    -> Libraries/libb:poetry#Flask
  2. Then, tests in Service2 where Flask is defined directly in the service2 resolve
    Services/service2/service2/main_test.py@resolve=service2 # this is the "top-level resolve"
    -> Services/service2/service2/main.py@resolve=service2
    -> Libraries/libB/libb/main.py[@resolve=None]
    -> Services/service2:poetry#Flask@resolve=service2 # since flask exists in the top-level resolve
benjyw commented 3 months ago

A new ambiguity resolution method sounds pretty sensible for this, and if you can get that to work we'd welcome the PR. Let us know how it goes as you continue hacking on this.

jasondamour commented 3 months ago

I threw my POC into this PR, but I ran into some questions (like you mentioned) that I'm not able to answer:

IMO this would require a new abstraction, like a "Pants Subproject" , where: