pdm-project / pdm

A modern Python package and dependency manager supporting the latest PEP standards
https://pdm-project.org
MIT License
7.81k stars 386 forks source link

Workspace support #1505

Open frostming opened 1 year ago

frostming commented 1 year ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

noirbizarre commented 1 year ago

That would be awesome. Do you already have something in mind ? Same approach as Cargo ?

frostming commented 1 year ago

Yes it would be similar to Cargo:

For example, the content of pyproject.toml:

[tool.pdm.workspace]
packages = ["packages/*"]

File structure:

.
├── pyproject.toml
└── packages
   ├── foo
   │   └── pyproject.toml
   └── bar
       └── pyproject.toml

pdm add click will add click to the dependencies of both foo and bar pdm run test will run test script(if exists) for both foo and bar pdm add --include foo: foo only pdm add --exclude foo: all but foo.

sanmai-NL commented 1 year ago

What would be the best workaround for now to emulate this feature at the least cost, @frostming?

frostming commented 1 year ago

What would be the best workaround for now to emulate this feature at the least cost, @frostming?

Using editable installs. Here is a simple example.

parent pyproject.toml:

[tool.pdm.dev-dependencies]
workspace = [
  "-e file:///${PROJECT_ROOT}/packages/foo#egg=foo",
  "-e file:///${PROJECT_ROOT}/packages/bar#egg=bar",
]

Metadata of foo: packages/foo/pyproject.toml:

[project]
name = "foo"
version = "0.1.0"

[build-system]
requires = ["pdm-pep517"]
build-backend = ["pdm.pep517.api"]

Metadata of bar: packages/bar/pyproject.toml:

[project]
name = "bar"
version = "0.1.0"
dependencies = ["foo"]  # specify dependency of other packages inside your workspace as named requirement

[build-system]
requires = ["pdm-pep517"]
build-backend = ["pdm.pep517.api"]

Then in the parent project, run: pdm install, will generate a pdm.lock and install foo and bar in editable mode into the environment. Since they are in editable mode, any modification to the foo or bar packages will take effect without reinstallation.

This should work NOW, the proposal in this issue will be a wrapper around the above with a more friendly UI.

lqhuang commented 1 year ago

Background: major in Python, sometimes Scala (sbt) and Rust (cargo)

I also want to propose a file layout with my opinion. Here is an example of my ideal PDM workspace:

.
├── docs
├── package-01
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── mynamespace
│   │       └── foo
│   └── tests
├── package-02
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── mynamespace
│   │       └── bar
│   └── tests
├── package-03
│   ├── docs
│   ├── mynamespace
│   │   └── baz
│   ├── pyproject.toml
│   └── tests
├── package-04
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── package_04
│   └── tests
├── package-05
│   ├── docs
│   ├── package_05
│   ├── pyproject.toml
│   └── tests
├── pyproject.toml
└── tests

And let me add some notes:

  1. In my case, the reason I want to use mono-repo pattern for Python is I have several packages have the same top level namespace (or for enterprise internal tools, it could also be an org name) to maintain, but they are currently developed and released in different repos. So in my proposal package-01, package-02 and package-03 following PEP420 have the same namespace, except package-03 doesn't use src layout.
  2. package-04 and package-05 stand for those standard way without own namespace and respectively layout with src or without src
  3. All packages will have its own individual tests + docs directories and pyproject.toml

Other thoughts:

  1. I place all packages in top level root for that I'm worried about overly nested directory level if src + namespace layout is applied. I think it's fine if this mono-repo is a pure Python project. But in some cases, those projects developed by multiple languages will have more complex structures. So it could be configurable like:

    [tool.pdm.workspace] # pyproject.toml in root dir
    packages = ["./*"]  # search by glob pattern
    packages = ["packages/*"]  # nodejs style
    packages = ["modules/*"]  # jvm style
    packages = ["src/*"]  # rust/c/c++ style
    
    packages = ["packages/foo", "packages/bar", "packages/baz"]  # manually specified
  2. (Optional) Every sub-packages in mono-repo could be developed or run just like they are individual without top level tool. This would affect how to design the schema of sub-package's pyproject.toml. Of course, this probably is not necessary, unless somebody hopes he/she can do some simple develop routines to sub-package.

  3. Sub-packages could share the same version defined in top level pyproject.toml and also use individual step-in version controlled by itself. For example, some packages will be released in the same time, then some auxiliary utils could be released in different schedules.

  4. There are still docs and tests in top level root for a general entrypoint of docs' web and integrated tests for all sub-packages.

I create a repo (which could be transferred into PDM org) to illustrate current proposals of PDM workspace. Maybe we could write some specifications there or use repo for unit tests and examples in the future.

Feedback is welcome and appreciated. I'm willing to help to develop workspace feature because I could use it in my projects right now. My concerning is I'm not an expert on Python packaging area yet, perhaps guidance under mentor are required.

Finally, thanks for @frostming's efforts!

lqhuang commented 1 year ago

And next problem struggle me is how to make linter tools (pre-commit hook / mypy / ruff / etc ...) fit for mono-repo.

carderne commented 11 months ago

Just sharing my experience of trying to use pdm for a monorepo setup in case it's useful to anyone. I used the pdm-example-monorepo as a starting point

Basically you can probably technically make it work, but you'll need to write a bunch of custom scripts to make the DX tolerable.

But it doesn't seem far to go, just some CLI sugar! I guess monas is your (frontming) experiment to resolve this?

jacksonwb commented 10 months ago

How does building wheels of packages with path dependencies work in this context?

frostming commented 10 months ago

How does building wheels of packages with path dependencies work in this context?

I would prefer to specify dependency versions in sub packages, rather than specifying path dependencies, and those dependencies, if included by the workspace, will be installed with the local paths:

Something like:

[project]
dependencies = [
    "foo==${workspace_version}"
]

And when being built, the version variable will be replaced with the real version in the current workspace.

However, this is not valid PEP 621 metadata, which doesn't allow ${...} in the version part. So either break the standard or use our own table to specify metadata, like poetry. Still brainstorming

sanmai-NL commented 10 months ago

Imitating other popular tools may be wise to converge on a standard over time.

DavidVujic commented 8 months ago

I think this could be related to what I just wrote in a PDM discussions thread about monorepos: https://github.com/pdm-project/pdm/discussions/1861#discussioncomment-8229683

flyingleafe commented 7 months ago

@frostming any progress with the above proposal? Is there any way to help with this coming to fruition?

The workaround using editable installs is not working well. pdm install in the root monorepo directory does not detect changes in the subpackages' pyproject.toml files - when I add a new dependency to the file manually, it does not get installed. I have to remove and add the editable package in the monorepo config each time. Scripting workarounds which make it work are possible, but are very ugly.

frostming commented 7 months ago

I have to remove and add the editable package in the monorepo config each time.

No, just run pdm update to pick them up.

sanmai-NL commented 7 months ago

A fruitful initial design could be to accept multiple project root directories on the CLI and complete the operation in parallel. A later workspace definition-based feature could reuse a lot of this early work by running a subprocess under the hood.

flyingleafe commented 7 months ago

@frostming Okay, pdm update works for my issue, thanks!. Would you give an advice on how to also pick up the dev dependencies of the sub-packages, defined in [tool.pdm.dev-dependencies] section of the sub-package's pyproject.toml?

alexcochran commented 5 months ago

@frostming any progress with the above proposal? Is there any way to help with this coming to fruition?

I would also be happy to help with any development

frostming commented 5 months ago

@alexcochran There still exist many that is undetermined, such as how to reuse or refer the config from the parent project in subprojects, and that will inevitably bring some new fields to the [project] table. I am thinking whether at least a part of it can be standardized, such as PEP 735: Dependency groups

alexcochran commented 5 months ago

@frostming I think that's a great direction. On the Node side, I think PNPM is a great model to reference. Their workspace design makes monorepo setup pretty trivial, and you can specify production and development dependencies for each project in their own package.json, very similar to how you have things working already

DavidVujic commented 5 months ago

There's several ways of organizing a monorepo (of course), and I suggest to have a look at the way the Polylith Architecture solves these kind of problems. I understand the need of PDM features and a cargo-like way of doing that. Polylith has a different take on this thing and focuses on the sharing code between projects, and a nice developer "single project"-like developer experience. This means all code will have the same linting and all other dev related things same across all code in the monorepo. You can use this architecture already today with PDM.

I am the developer of the tooling support for this in Python, and there is a PDM-specific hook available. Really nice work with this way of interacting with PDM, by the way 👏 ⭐ The hook system has made adding tooling like this really simple.

frostming commented 5 months ago

@DavidVujic IIUC, does it look like https://github.com/GreyElaina/Mina ?

BTW, polylith is a wonderful project, good job!

DavidVujic commented 5 months ago

@DavidVujic IIUC, does it look like https://github.com/GreyElaina/Mina ?

BTW, polylith is a wonderful project, good job!

Thank you!

I haven't seen the Mina repo before, but will have a look to learn if there are similarities.

damymetzke commented 5 months ago

I have some interest in this feature. I'm currently with a monorepo using poetry at work, if workspaces are added to PDM I will immediately start migrating. I'd like to support its development wherever I can.

For my use case, it's especially important that the feature works well with package registries. This is because I'm using a monorepo to manage multiple packages which need to be uploaded to an index. Dependencies would be specified as usual, using regular version requirements. In development it should prefer workspace packages if the version is valid, but when packaged and uploaded it should resolve versions using the index. I believe this behavior matches with pnpm workspaces, which has been mentioned before.

I also want to clarify what is expected to happen when you install from a project rather than the workspace root. I would expect it to detect the workspace and function accordingly.

Finally I want to clarify how non-pdm backends are treated. I think it makes sense to support alternative PEP517 compliant backends when they are specified in the pyproject.toml file. Not only does this make migration easier, it may be required in some cases. Like when you want to link to Rust code using maturin. I feel like this would be a common enough occurrence in practice that it warrants explicit support.

joaomcarlos commented 1 month ago

I see little reason to setup a project as suggested before :

.
├── docs
├── package-01
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── mynamespace
│   │       └── foo
│   └── tests
├── package-02
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── mynamespace
│   │       └── bar
│   └── tests
├── package-03
│   ├── docs
│   ├── mynamespace
│   │   └── baz
│   ├── pyproject.toml
│   └── tests
├── package-04
│   ├── docs
│   ├── pyproject.toml
│   ├── src
│   │   └── package_04
│   └── tests
├── package-05
│   ├── docs
│   ├── package_05
│   ├── pyproject.toml
│   └── tests
├── pyproject.toml
└── tests

After all, if there's only 1 user of each package. The extra complexity seems unnecessary.

However, if you are using a microservice architecture, it starts to make a little bit more sense :

Basically, 2 projects an API and some sort of Users service (sorry for the odd example names, the project I am working on has very specific names that don't really fit here and I am not particularly inspired today) that share two libraries. A common functionality library and a database access layer (for this simple example, let’s say both microservices share a db, doesn't matter why).

This would yield 3 deployable images, one image that runs migrations (runs once, but can be ran many times, its idempotent) and 2 services (API that serves HTTP and a backend service which is responsible for Users).

My current main project at work follows this structure and contains 9 services (in 2 main languages) with 2 packages.

Each service/package produces its own pdm.lock for "internal-use" within its own docker image (and .venv) from its pyproject.toml file, this file imports common things from the main pyproject.toml file (such as linter settings and other details).

We are currently bypassing the limitations of PDM by being creative with makefiles. For example make pdm-sync will call make pdm-sync on the makefile of each project with $(MAKE) -C packages/common pdm-sync (we got a line for each project), same for lock. Other functions are not "shared" such that they can't be accidentally abused (call update on all for example).


my-app/
├─ docs/
├─ packages/
│  ├─ common/
│  │  ├─ common/
│  │  │  ├─ logger.py
│  │  │  ├─ exceptions
│  │  ├─ tests/
│  │  ├─ pyproject.toml
│  │  ├─ pdm.lock
│  │  ├─ Dockerfile.test (CI tests)
│  ├─ orm/
│  │  ├─ orm/
│  │  │  ├─ models/
│  │  │  │  ├─ users.py
│  │  ├─ tests/
│  │  │  ├─ test_users.py
│  │  ├─ Dockerfile (migrations runner)
│  │  ├─ pyproject.toml
│  │  ├─ pdm.lock
│  │  ├─ migrations/
│  │  ├─ Dockerfile.test (CI tests)
├─ projects/
│  ├─ api/
│  │  ├─ api/
│  │  │  ├─ service.py
│  │  ├─ tests/
│  │  │  ├─ test_service.py
│  │  ├─ Dockerfile (api runner)
│  │  ├─ Dockerfile.test (CI tests)
│  │  ├─ pyproject.toml
│  │  ├─ pdm.lock
│  ├─ users/
│  │  ├─ users/
│  │  │  ├─ profile/
│  │  │  │  ├─ user_image_downloader.py
│  │  │  ├─ crud/
│  │  │  │  ├─ create_user_dto.py
│  │  │  │  ├─ create_user.py
│  │  │  ├─ service.py
│  │  ├─ tests/
│  │  │  ├─ test_service.py
│  │  ├─ Dockerfile (api runner)
│  │  ├─ Dockerfile.test (CI tests)
│  │  ├─ pyproject.toml
│  │  ├─ pdm.lock
│  ├─ github_sync_interface/
│  │  ├─ github_sync_interface/
│  │  │  ├─ service.py
│  │  ├─ tests/
│  │  │  ├─ test_service.py
│  │  ├─ Dockerfile (api runner)
│  │  ├─ Dockerfile.test (CI tests)
│  │  ├─ pyproject.toml
│  │  ├─ pdm.lock
├─ .gitignore
├─ pyproject.toml
├─ README.md
DavidVujic commented 1 month ago

@joaomcarlos I think it is possible to take the setup with sharing code even further than extracting parts of it into common code. I'll bet that the different service.py contain code (functions, maybe) that could be reused. 😄

This is what the Polylith architecture is about (I have mentioned it here in this thread before). A nice side-effect is that the need for lock-files becomes less important, when the code you share is a smaller part (smaller than what usually is put in a library). PDM works really well with this kind of setup too.

damymetzke commented 1 month ago

I have some thoughts about more complex use cases. @joaomcarlos, to me your comment suggests that you think you need complex interdependencies (like microservices), to justify using workspaces. Feel free to correct me if I've misinterpreted that. And the Polylith architecture @DavidVujic mentioned looks very interesting, although for me it only makes sense if you need to avoid code duplication. Which means you also need complex interdependency to justify it.

I think there are a lot of simpler cases that we should consider as well. I can highlight 2 ideas that I find important with some examples. The first is cases where there are no interdependencies at all. For example, I introduced a monorepo at my job to collect a lot of smaller projects, and at the time we had no libraries at all. My motivation was to get everything in one place, so I could more easily write tools that work on all projects and have everything in one place. To be fair, you don't really need workspaces for this at all. But this use case may benefit from reusing a single virtual environment and the ability to update all projects with one command.

Another use case is one I use a lot in early development of projects. Here I usually create 2 or 3 packages. For example, I have a workspace called my-project. With 2 projects: my-project and my-project-cli. Where I use the cli to manually test functionality. In this case there is an interdependent relationship, however it is extremely predictable and unlikely to cause issues. I've also done this with some novel setups, like using maturin and pyo3 to allow development in Rust.

Relatively intricate solutions always run more risk of becoming obsolete over time. It's important to consider these use cases, but ideally you'd provide simple basic components to achieve them. This also makes it more flexible.

Given that, I find it difficult to pin down what would be expected from a workspace implementation in relation to microservices and Polylith. I'm pretty sure that workspaces like Cargo do them already support microservices fine. In fact, the only benefit I can imagine is reducing disk usage by sharing the virtual environment and lock file. And Polylith is such a specialized solution, that in my opinion having it as a plugin is much more appropriate.

I would say the first logical step is to add support for the workspace configuration, and support running commands over all projects at once. Specifically to update dependencies and run scripts like tests. I'm not quite sure what the next step should be though. I think it would be useful though to get some more concrete ideas for features, over high-level solutions to specific use-cases.

joaomcarlos commented 4 weeks ago

Hi,

Thank you both for your comments, @DavidVujic and @damymetzke !

I think that the best solution would be to get the structure be defined as configuration instead. This would solve everyones problems I guess.

In the case I presented I dont really have complex interdependencies @DavidVujic (the only inter dependency would be orm perhaps depending on common), so, each project is literally completely separate from each other, and only really re-use packages (literally as if they were in PyPi). There isn't really components you define and re-use everywhere, like a front-end Dashboard would (re-use of fully developed isolated components). These packages serve to share common functionality, not complete components, if that makes sense. At least in my case. If I was doing that, I think Polylith might be more appropriate, you are right. Although the ORM package is nearly an edge case, in that one could start to consider it a component on itself if more business logic was added on to it. In my case, my usecase allows me to completely separate responsibilities out into each microservice, so the ORM barely needs any extra logic other than what is absolutely needed to maintain the data store (database), which is the relational mapping and migrations. That extra logic being small pieces of validations (start date cant be later than end date, etc), relationship enforcing (when saving nested data, here's a property to reach the root), meaning clarification code (example, task.has_ran() is a property that checks self.ran_at != None) and small helper properties (like total on a nested structure means sum of total of its components).

@damymetzke in my case, the project is a monorepo mainly for the purpose of commits being atomic along services. Not because the code is inter-dependent per se, but rather the business process around it needs that features be merged, reviewed and then deployed as atomic units for the purposes of UAT testing.

Keeping track of this when each microservice was its own repository was quite challenging, although GitLab allowed us to define issues and merge requests with dependencies.

Another issue was that changes to packages needed microservices to be updated even if code in them hadn't changed. Not because it wouldn't run but to reduce the amount of manual diff that was required (although proper tests mitigated most of it).

With monorepo and multi-service testing with one command, we guarantee that all devs have proper visibility into their changes. And lets be honest, renaming a helper function (contextually, not with mass replace) and having 9+ projects instantly update and all in one single commit is pure badass.