python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.53k stars 2.27k forks source link

Support subprojects in a poetry project #2270

Open abn opened 4 years ago

abn commented 4 years ago

Background & Rationale

This request is inspired by RPM Package Manger’s capability to build subpackages from the same Spec File.

Here, I want to propose and discuss replication a version of this capability can be replicated within poetry to allow for simplified user experience for a python project maintainer, especially when either maintaining namespace packages and/or multi-project source trees. While strict project separation is a good thing in most cases, it might not always be the more pragmatic scenario for package maintainers.

For our purposes here, we can refer to each of theses packages as a subproject. And all subprojects are managed under a single poetry project. This means that there is only a single pyproject.toml file and a shared project root directory with either a shared source tree or independent source trees (subdirectory) for each subproject.

Description

Let us consider the scenario of multiple namespace packages being maintained in a single repository with the following structure.

    namespace-project/
    └── src
        └── namespace
            └── package
                ├── one
                │   └── __init__.py
                ├── three
                │   └── __init__.py
                └── two
                    └── __init__.py

Note that this will still apply even if different source directories exists within the root directory for each subproject.

Here the intention could be that we want to distribute 3 packages, namely, namespace-package-one , namespace-package-two and namespace-package-three.

For the purpose of this example, let us assume that namespace-package-three depends on namespace-package-one. The pyproject.toml file could look something like this.

New sections are annotated with comments detailing them and expected behaviour.

[build-system]
requires = ["poetry-core>=1.0"]
build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "namespace-package"
version = "1.0.0-alpha.0"
description = ""
authors = [
    "Bender Rodriguez <bender@planetexpress.com>"
]
license = "MIT"
readme = "README.md"
repository = "https://git.planetexpress.com/bender/python-namespace-package"
keywords = []
classifiers = [
    "Intended Audience :: Developers",
    "Operating System :: OS Independent",
    "License :: OSI Approved :: MIT License",
    "Programming Language :: Python :: 3 :: Only",
    "Programming Language :: Python :: 3.8",
]

# this section remains as is, but now specifies shared dependencies
[tool.poetry.dependencies]
python = "^3.8"

[tool.poetry.dev-dependencies]
pre-commit = "^2.1"
flake8 = "^3.7"
black = "^19.10b0"
pytest = "^5.2"

# the following are package specific section
[tool.poetry.packages.one]
name = "namespace-package-one"  # this is optional as name would be derrived from <project.name>-<package name from section>
description = ""  # this will overide the description from the project for this package
readme = "README.one.md"  # this will overide the readme from the project for this package
packages = [  # this is mandatory for sub-packages
    # any package not included in a sub-package is added to the base package (ie. "namespace-package")
    # if the "packages" property is not explicitly configured in the base
    { include = "namespace.package.one", from = "src" }
]

[tool.poetry.packages.one.dependencies]
ujson = "^1.35"

[tool.poetry.packages.one.dev-dependencies]
pytest-mock = "^2.0"

[tool.poetry.packages.two]
packages = [ 
    { include = "namespace.package.two", from = "src" }
]

[tool.poetry.packages.two.dependencies]
psycopg2 = "^2.8.4"

[tool.poetry.packages.two.dev-dependencies]
pytest-postgresql = "^2.3.0"

[tool.poetry.packages.three]
requires = [ # this enables us to specify the relationships between sub-packages
    "one" # this could also be namespace-package-one
]
packages = [ 
    { include = "namespace.package.two", from = "src" }
]

[tool.poetry.packages.three.dependencies]
aiohttp = "^3.5"

[tool.poetry.packages.three.dev-dependencies]
beautifulsoup4 = "^4.8"
aioresponses = "^0.6"
pytest-asyncio = "^0.10"

Under this scenario, the following might be what the cli commands look like. Current behaviour will remain unaltered as these are additive changes.

$ poetry add --package one <dependency>
.. <similar to current add output>

$ poetry packages list
namespace-package-one
namespace-package-two
namespace-package-three

$ poetry build
<builds all three packages>

$ poetry build --package one
<builds only namespace-package-one>

$ poetry publish --dry-run
...
Publishing namespace-package-one (1.0.0-alpha.0) to PyPI
  - Uploading namespace-package-one-1.0.0-alpha.0.tar.gz
  - Uploading namespace-package-one-1.0.0-alpha.0-py3-none-any.whl

Publishing namespace-package-two (1.0.0-alpha.0) to PyPI
  - Uploading namespace-package-two-1.0.0-alpha.0.tar.gz
  - Uploading namespace-package-two-1.0.0-alpha.0-py3-none-any.whl

Publishing namespace-package-three (1.0.0-alpha.0) to PyPI
  - Uploading namespace-package-three-1.0.0-alpha.0.tar.gz
  - Uploading namespace-package-three-1.0.0-alpha.0-py3-none-any.whl

Variations

The above is an initial though of how it might work. That said there are variations to this that should be discussed.

  1. Does a per-package dev-dependnecy section make sense? This only really makes sense if we want to allow for developing a single package at a time. However, this will become tricky in cases like here where "three" depends on "one". This will mean that when developing "three", dev dependencies for "one" should also be installed. If isolation is required, then multiple virtual environments will be required, which might be overkill for majority use cases for this feature.

  2. Will all packages be installed under PEP-0517? Is it even possible to install only specific package when being installed under PEP-0517? One possible solution might be to make use of "extras" here as a way of specifying which package if any to install, but default to all.

Extensions

  1. Optional Project Package As an extension to this, one might also want to optionally distribute a a namespace only package namespace-package (let's call this the "project package" for now) that installs the core dependencies and also allow for "extras" as we do today without requiring the distribution of the entire source tree with the binary distribution.

This means that if someone does pip install namespace-package, the maintainer might expect the the following to be installed:

  1. The namespace namepace.package.
  2. Packages namespace-package-one and namespace-package-three, which are required for the "default" install.

An end-user can also install the remaining package, like so - pip install namespace-package[two] which simply will install a dependency namespace-package-two.

This behaviour might not be desired in all cases, and can be considered opt-in.

kapilt commented 4 years ago

I recently went through converting over a mono repo with several packages over to poetry, and thought it might be useful to share what we did, and pain points and bug work arounds. Although also recognizing this proposal would hopefully make it all obsolete :-) Still this might provide some utility to those who want to do mono repos prior to native support in poetry.

first a few context/caveats, we don't use namespace packages vs a common prefix, and our fs layout is little different. that's non material to the techniques used, but perhaps relevant to the proposal.

main_pkg
tools/
   pkg_1
   pkg_2
   pkg_3
   ...  

at the moment all the packages under tools have dependencies on the main package declared as a path based dev-dependency.

[tool.poetry.dev-dependencies]
# setup in tree as a dev dependency                                                                                                                                                                                                                                                                                                                  
c7n = {path = "../..", develop = true}

i attempted to resolve it as a normal dependency caused a few issues with poetry build (issues #2046, partial fix #2047, also reported/pr by others).

so using as a dev dependency worked but also meant not using poetry directly as a build/publish tool to work around those issues and still needed the injection of the main_pkg as a regular project dep when publishing. we ended up using poetry metadata/api to generate setup/requirements for that purpose, converting dev dependencies to regular dependencies in the process. https://github.com/cloud-custodian/cloud-custodian/blob/master/tools/dev/poetrypkg.py#L121

unrelated to multi-project, but to the generation workaround, we ran into another issue that in that the masonry sdist builder didn't really support markdown readmes (pr #1994)

for handling ergonomics simplicity around multiple commands that needed to update versions/ or release, we added in makefile targets to frontend,

pkg-update:
    poetry update
    for pkg in $(PKG_SET); do cd $$pkg && poetry update && cd ../..; done

One interesting consequence of source directory dependencies in poetry is that it break any attempts to distribute/publish a package even if they are dev deps. ie. per the pyproject.toml spec is that via the build-system PEP, poetry will be invoked during install. The invocation/installation of poetry as a build sys is transparently handled by pip. Simple resolution/parse of pyproject.toml dev dependencies will cause a poetry failure for an source distribution install, as installation of an sdist, is actually a wheel compilation.

As a result of this as a publishing limitation we only publish wheels instead of sdists which avoids the build system entirely, as a wheel is extractable installation container/format file.

we're also maintaining compatibility with tox/setuptools ecosystem for compatibility with developer workflows, there's a few more details on what we did here https://cloudcustodian.io/docs/developer/packaging.html

abn commented 4 years ago

@kapilt thank you writing that up. It is extremely useful and insightful.

dazza-codes commented 4 years ago

This proposal is valuable. As it is, poetry supports optional dependencies, but not optional packages

The use of optional packages for a namespace project is really useful. :+1: for including the optional-package as part of this proposal.

djerraballi commented 4 years ago

shared dependencies are very useful, but might make sense to inherit some of the logic from Maven regarding the shared block:

  1. Allow definitions of dependencies and versions in a shared block
  2. only pull them into the package if that dependency name is explicitly used in the dependency. In this way we can define standard versions for certain dependencies across all packages, but not require all packages to install those packages at those versions. (Can be overriden in the package depenendency block).

while it does complicate things the benefits are:

  1. No unneeded dependencies in modules of a multi-module project.
  2. When multiple but not all packages have the same dependency, we can define the version once, but still explicitly pull the dep.
  3. Enabling overrides for versions for certain modules can be very useful and get people out of some hairy situations.
xinbinhuang commented 3 years ago

This proposal is really valuable! I wonder what's the latest status of this? Is this currently being working on? I would love to devote some time to speed up the process if possible.

patrickelectric commented 3 years ago

are we there yet ?

johnwalz97 commented 3 years ago

Any updates? Really see some value for this!!!

mrlucasrib commented 3 years ago

Unfortunately there is nothing new about this yet, but I found a monorepo manager called Bazel, which is widely used and supports many languages. If your goal is to work only with Python Pants Build might be an easier solution.

cognifloyd commented 3 years ago

Maybe we could have pyproject.toml in each of the subprojects. Then, add a poetry plugin that coordinates updating dependencies between the top-level pyproject.toml and the children. That might mean adding a setting in each of the pyproject.toml to say whether they are a parent or a child.

woile commented 3 years ago

Could the new dependency groups be leveraged in a way to achieve this proposal? I have some intuition it could, but not sure how

klDen commented 2 years ago

Maybe we could have pyproject.toml in each of the subprojects. Then, add a poetry plugin that coordinates updating dependencies between the top-level pyproject.toml and the children. That might mean adding a setting in each of the pyproject.toml to say whether they are a parent or a child.

This is what Maven does. Here's an example on how Maven provides the capability to share modules between projects: https://www.baeldung.com/maven-multi-module

NixBiks commented 2 years ago

Is it currently possible to have something like what yarn workspaces does?

So having pyproject.toml in each package but they can share dependencies from a root virtual environment.

I've been trying with a single pyproject.toml and using packages, optional and extras but it becomes very handheld - would be easier to define dependencies in the subpackages in their own pyproject.toml (I hope it makes sense)

fredrikaverpil commented 2 years ago

Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml?

So for example;

[tool.poetry.dependencies]
subproject = {path = "subproject", develop = true}
NixBiks commented 2 years ago

Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml?

So for example;

[tool.poetry.dependencies]
subproject = {path = "subproject", develop = true}

Yes that'll install my root environment but the subproject still wants its own virtual environment so I end up with a virtual environment for each subproject plus one for the root. I want a single virtual environment to be used by all projects (and keeping a pyproject.toml for each project) - that is how yarn workspaces works AFAIK.

fredrikaverpil commented 2 years ago

@mr-bjerre I know it's not what you are asking for, but you could try symlinking the two virtual environments together.

If this is a requirement, I would probably go for another solution altogether. For example use pip-tools to manage deps (can use multiple input files, one from each project) and twine or flit to publish.

AdamJel commented 2 years ago

As discussed on Discord, this would be of a huge help to our team. Any progress/time-estimate on implementation? Thank you.

Secrus commented 2 years ago

As discussed on Discord, this would be of a huge help to our team. Any progress/time-estimate on implementation? Thank you.

Right now team is focused on getting 1.2 released. This could be something to ship as next "big" feature (like groups and plugins in 1.2). However, right now there is no estimation on when this is gonna be added. This is also something that might be added as 3rd party plugin after 1.2 is released.

ljnsn commented 2 years ago

Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml? So for example;

[tool.poetry.dependencies]
subproject = {path = "subproject", develop = true}

Yes that'll install my root environment but the subproject still wants its own virtual environment so I end up with a virtual environment for each subproject plus one for the root. I want a single virtual environment to be used by all projects (and keeping a pyproject.toml for each project) - that is how yarn workspaces works AFAIK.

You can do that by creating a local config (poetry.toml) for each of your sub-packages with virtualenvs.create false. You then mark the sub-packages as dependencies as suggested by @fredrikaverpil and include them as packages.

AbdealiLoKo commented 1 year ago

Is there any way for me to also install my dev-dependencies using this central pyproject.toml ?

In the sense I get subproject = {path = "subproject", develop = true} installs my package But I also want the dev-dependencies of subproject to be installed

Note: I am talking about tool.poetry.group.dev.dependencies not tool.poetry.extras

neersighted commented 1 year ago

This is not a feature that currently exists. We likely will not support leaking dev-depdendencies over path relationships; the design discussed in this issue is using a super-pyproject.toml instead of linking individual projects together.

wizpresso-steve-cy-fan commented 1 year ago

We also wanted to have such feature because it will help us pin the right version for the right libraries. This is because some version of the dependencies may not work on some versions (cough cough numpy).

So, although we can duplicate all the dependencies in multiple projects and let it be, this could create a subtle portability hell with regards to interoperability on different versions of the same library. Without having a master project that defined all the dependencies, this will be quite difficult to manage to say the least. We have over 100 packages and we can't afford to manually inspect each other either.

It seems like Cargo did it pretty well by pinning the version on subproject members to depend on one master cargo.

gerbenoostra commented 1 year ago

I've written a blogpost and demo repo where I demonstrate how poetry can (quite easily) be used in a mono repo with subpackages. Perhaps the utility scripts in it can help you. Blogpost: https://gerben-oostra.medium.com/python-poetry-mono-repo-without-limitations-dd63b47dc6b8 Repo: https://gitlab.com/gerbenoostra/poetry-monorepo/

adriangb commented 1 year ago

Hey folks, I've written up a proposal for monorepo support using path dependencies and dependency groups, all existing features of Poetry: https://github.com/python-poetry/poetry/issues/6850. There's an example repo at https://github.com/adriangb/python-monorepo/tree/main/poetry with more details.

The pattern is quite functional already, I've been using it in production for several months now. The only things I think are missing are:

I'd like to understand what use cases that cover or doesn't and have folks who have tried this or similar things poke holes in the proposal to make sure it's solid.

adriangb commented 1 year ago

If you want https://github.com/python-poetry/poetry/issues/2270#issuecomment-1445417107 to happen (or have objections) please chime in on the linked issue. I see a total of 18 👍 or equivalent but sadly only one of you has chimed in on #6850

kapilt commented 1 year ago

fwiw, just an update to my previous comment, https://github.com/python-poetry/poetry/issues/2270#issuecomment-615809216 to support both mono repo and frozen wheels (version spec switch to ==version), I went ahead and moved to a poetry plugin (freeze) that also handles resolving path dev dependencies. it operates effectively as a post build tool / pre publish tool directly against the wheel. its pretty early (ie. functional, but no tests, cli options) but I'm hoping to get those flushed out so we can use it for prod releases against a mono repo this month. https://github.com/cloud-custodian/poetry-plugin-freeze

MateoSaezMata commented 1 year ago

Anything new on this? https://github.com/python-poetry/poetry/issues/2270#issuecomment-1445417107 Seems to nearly solve the problem despite the distribution (packaging) issue

luketych commented 8 months ago

Anything new on this? #2270 (comment) Seems to nearly solve the problem despite the distribution (packaging) issue

Not sure. Just coming across all of this for the first time. So I am looking forward to it!

davidroeca commented 8 months ago

I've been testing out some monorepo approaches and started with @adriangb's approach here. DX is the main issue -- challenges with this approach surround poetry run from a subproject - as mentioned here.

poetry.lock and .venv seem to be the the main painpoints here -- and the workarounds mentioned involve keeping poetry.lock in sync (or .gitignored). Custom scripts have been implemented in a variety of solutions including @gerbenoostra's here to accommodate the extra lockfiles.

It would be nice to:

Ideally, poetry or the plugin could find the root lockfile/pyproject.toml, or there could be some way that the developer specifies it. This would lead to a similar experience to cargo, yarn, and npm.

adriangb commented 8 months ago

I think a plugin would solve all of those issues and should be doable. I haven’t written one just because the DX isn’t bad enough for me to justify spending time on it. And I usually don’t end up running poetry … from a subproject, most things happen from the top level Makefile.

tnielens commented 7 months ago

@adriangb what is your solution for replacing path dependencies with regular ones when publishing? So far, I'm using @gerbenoostra sh script. I wonder if there is any poetry plugin support for this. I also checked https://github.com/DavidVujic/poetry-multiproject-plugin but don't think that approach works for the path rewrite use case. Cfr this issue.

DavidVujic commented 7 months ago

There's a thing called Polylith that has a different take on the problems of monorepos and sharing code, than the suggested solutions in this thread. But I think that it could be interesting to share this approach for you here.

Third-party dependencies are a thing of its own, but the code that we have control over in the projects is different. It can be shared across projects in quite a simple way by using a Monorepo with a developer experience similar to a single-project repo. In Polylith, there's no symlinks or other quirks needed (unless you view the plugins as quirks).

Having the code organized as namespace packages - just as with single-project repos - and the individual projects including what is needed by using the packages key in the [tool.poetry] section (using the from attribute). Each "project" (i.e. the artifact to build and deploy) has its own pyproject.toml file. There's documentation about Polylith for Python here.

To make this work in a Poetry context, there is the MultiProject plugin, as mentioned above by @tnielens. That plugin makes it possible to use relative includes (in the from attribute) during the development. For deplyment, you will build proper PEP-valid wheels. This is done by using the custom build-project command that comes with the plugin.

Having something to visualize the code in a Monorepo is probably helpful, and that is where the tooling support for Polylith comes in. There's several commands to visualize, calculate diffs, synchronize projects and create Python code according to the Polylith Architecture. The tool is, of course, Open Source 😄 I hope this helps!

adriangb commented 7 months ago

@adriangb what is your solution for replacing path dependencies with regular ones when publishing?

I don’t have a solution because it’s not a used case I’ve had. I imagine a plug-in could do something similar to the scripts I’ve seen.

randomgeek78 commented 7 months ago

There's a thing called Polylith that has a different take on the problems of monorepos and sharing code, than the suggested solutions in this thread. But I think that it could be interesting to share this approach for you here.

Third-party dependencies are a thing of its own, but the code that we have control over in the projects is different. It can be shared across projects in quite a simple way by using a Monorepo with a developer experience similar to a single-project repo. In Polylith, there's no symlinks or other quirks needed (unless you view the plugins as quirks).

Having the code organized as namespace packages - just as with single-project repos - and the individual projects including what is needed by using the packages key in the [tool.poetry] section (using the from attribute). Each "project" (i.e. the artifact to build and deploy) has its own pyproject.toml file. There's documentation about Polylith for Python here.

To make this work in a Poetry context, there is the MultiProject plugin, as mentioned above by @tnielens. That plugin makes it possible to use relative includes (in the from attribute) during the development. For deplyment, you will build proper PEP-valid wheels. This is done by using the custom build-project command that comes with the plugin.

Having something to visualize the code in a Monorepo is probably helpful, and that is where the tooling support for Polylith comes in. There's several commands to visualize, calculate diffs, synchronize projects and create Python code according to the Polylith Architecture. The tool is, of course, Open Source 😄 I hope this helps!

I second David's plug for the amazing polylith plugin. We have successfully incorporated polylith to structure our repo and couldn't do without it. We are a ML shop and have many teams working on different problems but sharing a common code base. The builds are streamlined to only contain what each project needs so deployments are very thin and rarely cause issues. Highly recommended!

moattarwork commented 7 months ago

I'm not quite sure why everything is complicated here. This problem has been solved long time ago at the framework level in languages such as C# with nuget management that can consume the local packages and also download from package source in production (Wheel and source combination). Also frameworks such as NX in javascript are doing this for years with no problem so I don't know what is so complicated here that takes ages from poetry to do the same. The fact that is community driven should make it faster and not slower. It is now more than 3 years that is an open issue.

DavidVujic commented 7 months ago

As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄

pappasam commented 7 months ago

As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄

I would absolutely love for @moattarwork to step up and put some much-needed sweat toward solving this problem for me

gerbenoostra commented 6 months ago

Previously, I worked around it using some bash scripts to get named dependencies in the built artifacts, which I show in this demo repo and this blogpost As mentioned above, Poetry is open source, so why not contribute more usefully to it?

Therefore, I've developed a Poetry Plugin for Mono Repo dependencies at https://github.com/gerbenoostra/poetry-plugin-mono-repo-deps/
In short, it modifies the poetry build to replace path dependencies with named dependencies in the final built wheel's metadata. Perhaps it suits (some) of your needs @moattarwork

hungbie commented 6 months ago

@gerbenoostra this is awesome, I will try this for my work!

soufea commented 5 months ago

@gerbenoostra Like the idea of using a plugin, thanks ! However, i couldn't make it work and there are no errors :/. Do you have any examples/documentation ? I couldn't make it work even on a simple example like yours in README.

── pyproject.toml
└── repo
    ├── A
    │   └── pyproject.toml
    └── B
        ├── poetry.lock
        ├── pyproject.toml
        └── reqs.txt
gerbenoostra commented 5 months ago

@soufea thanks for trying out! I'd love to help, but perhaps best to continue our discussion on the plugin's issue tracker? If you have an example repo online, I can check.

As a preliminary response, an example project is in the test fixtures folder.
Note that the plugin works on the project you're running it in.

Ideally I'd also create a plugin which implements the composition aspect of mono repo's, that when you run poetry build in /, it will apply it to each contained project (repo/A & /repo/B), but didn't get to that (yet).

moattarwork commented 5 months ago

As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄

I would absolutely love for @moattarwork to step up and put some much-needed sweat toward solving this problem for me

Sorry guys. I was away for sometime but I'm happy if I can work on this. I'm not familiar with the code base but comparing this with the closest eco-system (NodeJs), I think the complexity lies in the toml file being a solid project file. Looking at project files in typescript, they can be inherited from each other (Or the dependency is working with locally linked project files) so they can easily inherit the the base configuration.

Although the project file contains information such as name, version, author, etc that uniquely identifying the project itself, however the dependencies and dev-dependencies could be inherited.

Solving this problem would be the first step toward implementation of a good mono-repo and the rest can be managed with standard templating.

moattarwork commented 5 months ago

As mentioned above, Poetry is open source, so why not contribute more usefully to it?

Thanks for sharing this. I will take a look

ag14774 commented 3 weeks ago

I've been testing out some monorepo approaches and started with @adriangb's approach here. DX is the main issue -- challenges with this approach surround poetry run from a subproject - as mentioned here.

poetry.lock and .venv seem to be the the main painpoints here -- and the workarounds mentioned involve keeping poetry.lock in sync (or .gitignored). Custom scripts have been implemented in a variety of solutions including @gerbenoostra's here to accommodate the extra lockfiles.

It would be nice to:

  • Support poetry run (or poetry subproject run if defined as a plugin) within a subproject, without requiring a lockfile or venv in the subproject
  • Support poetry add (or poetry subproject add) within a subproject, without requiring a lockfile or venv within the subproject -- this command would update the parent lockfile
  • Support poetry remove (similar to add)

Ideally, poetry or the plugin could find the root lockfile/pyproject.toml, or there could be some way that the developer specifies it. This would lead to a similar experience to cargo, yarn, and npm.

I took the liberty to create a plugin that supports these commands and it allows users to have a single lockfile + shared venv. Hope this helps! You can find it here: https://github.com/ag14774/poetry-monoranger-plugin