Open abn opened 4 years ago
I recently went through converting over a mono repo with several packages over to poetry, and thought it might be useful to share what we did, and pain points and bug work arounds. Although also recognizing this proposal would hopefully make it all obsolete :-) Still this might provide some utility to those who want to do mono repos prior to native support in poetry.
first a few context/caveats, we don't use namespace packages vs a common prefix, and our fs layout is little different. that's non material to the techniques used, but perhaps relevant to the proposal.
main_pkg
tools/
pkg_1
pkg_2
pkg_3
...
at the moment all the packages under tools have dependencies on the main package declared as a path based dev-dependency.
[tool.poetry.dev-dependencies]
# setup in tree as a dev dependency
c7n = {path = "../..", develop = true}
i attempted to resolve it as a normal dependency caused a few issues with poetry build (issues #2046, partial fix #2047, also reported/pr by others).
so using as a dev dependency worked but also meant not using poetry directly as a build/publish tool to work around those issues and still needed the injection of the main_pkg as a regular project dep when publishing. we ended up using poetry metadata/api to generate setup/requirements for that purpose, converting dev dependencies to regular dependencies in the process. https://github.com/cloud-custodian/cloud-custodian/blob/master/tools/dev/poetrypkg.py#L121
unrelated to multi-project, but to the generation workaround, we ran into another issue that in that the masonry sdist builder didn't really support markdown readmes (pr #1994)
for handling ergonomics simplicity around multiple commands that needed to update versions/ or release, we added in makefile targets to frontend,
pkg-update:
poetry update
for pkg in $(PKG_SET); do cd $$pkg && poetry update && cd ../..; done
One interesting consequence of source directory dependencies in poetry is that it break any attempts to distribute/publish a package even if they are dev deps. ie. per the pyproject.toml spec is that via the build-system PEP, poetry will be invoked during install. The invocation/installation of poetry as a build sys is transparently handled by pip. Simple resolution/parse of pyproject.toml dev dependencies will cause a poetry failure for an source distribution install, as installation of an sdist, is actually a wheel compilation.
As a result of this as a publishing limitation we only publish wheels instead of sdists which avoids the build system entirely, as a wheel is extractable installation container/format file.
we're also maintaining compatibility with tox/setuptools ecosystem for compatibility with developer workflows, there's a few more details on what we did here https://cloudcustodian.io/docs/developer/packaging.html
@kapilt thank you writing that up. It is extremely useful and insightful.
This proposal is valuable. As it is, poetry supports optional dependencies, but not optional packages
The use of optional packages for a namespace project is really useful. :+1: for including the optional-package as part of this proposal.
shared dependencies are very useful, but might make sense to inherit some of the logic from Maven regarding the shared block:
while it does complicate things the benefits are:
This proposal is really valuable! I wonder what's the latest status of this? Is this currently being working on? I would love to devote some time to speed up the process if possible.
are we there yet ?
Any updates? Really see some value for this!!!
Unfortunately there is nothing new about this yet, but I found a monorepo manager called Bazel, which is widely used and supports many languages. If your goal is to work only with Python Pants Build might be an easier solution.
Maybe we could have pyproject.toml
in each of the subprojects.
Then, add a poetry plugin that coordinates updating dependencies between the top-level pyproject.toml
and the children.
That might mean adding a setting in each of the pyproject.toml
to say whether they are a parent or a child.
Could the new dependency groups be leveraged in a way to achieve this proposal? I have some intuition it could, but not sure how
Maybe we could have
pyproject.toml
in each of the subprojects. Then, add a poetry plugin that coordinates updating dependencies between the top-levelpyproject.toml
and the children. That might mean adding a setting in each of thepyproject.toml
to say whether they are a parent or a child.
This is what Maven does. Here's an example on how Maven provides the capability to share modules between projects: https://www.baeldung.com/maven-multi-module
Is it currently possible to have something like what yarn workspaces
does?
So having pyproject.toml
in each package but they can share dependencies from a root virtual environment.
I've been trying with a single pyproject.toml
and using packages
, optional and extras
but it becomes very handheld - would be easier to define dependencies in the subpackages in their own pyproject.toml
(I hope it makes sense)
Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml?
So for example;
[tool.poetry.dependencies]
subproject = {path = "subproject", develop = true}
Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml?
So for example;
[tool.poetry.dependencies] subproject = {path = "subproject", develop = true}
Yes that'll install my root environment but the subproject still wants its own virtual environment so I end up with a virtual environment for each subproject plus one for the root. I want a single virtual environment to be used by all projects (and keeping a pyproject.toml
for each project) - that is how yarn workspaces
works AFAIK.
@mr-bjerre I know it's not what you are asking for, but you could try symlinking the two virtual environments together.
If this is a requirement, I would probably go for another solution altogether. For example use pip-tools to manage deps (can use multiple input files, one from each project) and twine or flit to publish.
As discussed on Discord, this would be of a huge help to our team. Any progress/time-estimate on implementation? Thank you.
As discussed on Discord, this would be of a huge help to our team. Any progress/time-estimate on implementation? Thank you.
Right now team is focused on getting 1.2 released. This could be something to ship as next "big" feature (like groups and plugins in 1.2). However, right now there is no estimation on when this is gonna be added. This is also something that might be added as 3rd party plugin after 1.2 is released.
Have you tried to add a path dependency to another project/folder, which contains its own pyproject.toml? So for example;
[tool.poetry.dependencies] subproject = {path = "subproject", develop = true}
Yes that'll install my root environment but the subproject still wants its own virtual environment so I end up with a virtual environment for each subproject plus one for the root. I want a single virtual environment to be used by all projects (and keeping a
pyproject.toml
for each project) - that is howyarn workspaces
works AFAIK.
You can do that by creating a local config (poetry.toml) for each of your sub-packages with virtualenvs.create false
. You then mark the sub-packages as dependencies as suggested by @fredrikaverpil and include them as packages.
Is there any way for me to also install my dev-dependencies
using this central pyproject.toml
?
In the sense
I get subproject = {path = "subproject", develop = true}
installs my package
But I also want the dev-dependencies of subproject
to be installed
Note: I am talking about tool.poetry.group.dev.dependencies
not tool.poetry.extras
This is not a feature that currently exists. We likely will not support leaking dev-depdendencies over path relationships; the design discussed in this issue is using a super-pyproject.toml instead of linking individual projects together.
We also wanted to have such feature because it will help us pin the right version for the right libraries. This is because some version of the dependencies may not work on some versions (cough cough numpy).
So, although we can duplicate all the dependencies in multiple projects and let it be, this could create a subtle portability hell with regards to interoperability on different versions of the same library. Without having a master project that defined all the dependencies, this will be quite difficult to manage to say the least. We have over 100 packages and we can't afford to manually inspect each other either.
It seems like Cargo did it pretty well by pinning the version on subproject members to depend on one master cargo.
I've written a blogpost and demo repo where I demonstrate how poetry can (quite easily) be used in a mono repo with subpackages. Perhaps the utility scripts in it can help you. Blogpost: https://gerben-oostra.medium.com/python-poetry-mono-repo-without-limitations-dd63b47dc6b8 Repo: https://gitlab.com/gerbenoostra/poetry-monorepo/
Hey folks, I've written up a proposal for monorepo support using path dependencies and dependency groups, all existing features of Poetry: https://github.com/python-poetry/poetry/issues/6850. There's an example repo at https://github.com/adriangb/python-monorepo/tree/main/poetry with more details.
The pattern is quite functional already, I've been using it in production for several months now. The only things I think are missing are:
I'd like to understand what use cases that cover or doesn't and have folks who have tried this or similar things poke holes in the proposal to make sure it's solid.
If you want https://github.com/python-poetry/poetry/issues/2270#issuecomment-1445417107 to happen (or have objections) please chime in on the linked issue. I see a total of 18 👍 or equivalent but sadly only one of you has chimed in on #6850
fwiw, just an update to my previous comment, https://github.com/python-poetry/poetry/issues/2270#issuecomment-615809216 to support both mono repo and frozen wheels (version spec switch to ==version), I went ahead and moved to a poetry plugin (freeze) that also handles resolving path dev dependencies. it operates effectively as a post build tool / pre publish tool directly against the wheel. its pretty early (ie. functional, but no tests, cli options) but I'm hoping to get those flushed out so we can use it for prod releases against a mono repo this month. https://github.com/cloud-custodian/poetry-plugin-freeze
Anything new on this? https://github.com/python-poetry/poetry/issues/2270#issuecomment-1445417107 Seems to nearly solve the problem despite the distribution (packaging) issue
Anything new on this? #2270 (comment) Seems to nearly solve the problem despite the distribution (packaging) issue
Not sure. Just coming across all of this for the first time. So I am looking forward to it!
I've been testing out some monorepo approaches and started with @adriangb's approach here. DX is the main issue -- challenges with this approach surround poetry run
from a subproject - as mentioned here.
poetry.lock
and .venv
seem to be the the main painpoints here -- and the workarounds mentioned involve keeping poetry.lock
in sync (or .gitignored). Custom scripts have been implemented in a variety of solutions including @gerbenoostra's here to accommodate the extra lockfiles.
It would be nice to:
poetry run
(or poetry subproject run
if defined as a plugin) within a subproject, without requiring a lockfile or venv in the subprojectpoetry add
(or poetry subproject add
) within a subproject, without requiring a lockfile or venv within the subproject -- this command would update the parent lockfilepoetry remove
(similar to add)Ideally, poetry or the plugin could find the root lockfile/pyproject.toml, or there could be some way that the developer specifies it. This would lead to a similar experience to cargo, yarn, and npm.
I think a plugin would solve all of those issues and should be doable. I haven’t written one just because the DX isn’t bad enough for me to justify spending time on it. And I usually don’t end up running poetry … from a subproject, most things happen from the top level Makefile.
@adriangb what is your solution for replacing path dependencies with regular ones when publishing? So far, I'm using @gerbenoostra sh script. I wonder if there is any poetry plugin support for this. I also checked https://github.com/DavidVujic/poetry-multiproject-plugin but don't think that approach works for the path rewrite use case. Cfr this issue.
There's a thing called Polylith that has a different take on the problems of monorepos and sharing code, than the suggested solutions in this thread. But I think that it could be interesting to share this approach for you here.
Third-party dependencies are a thing of its own, but the code that we have control over in the projects is different. It can be shared across projects in quite a simple way by using a Monorepo with a developer experience similar to a single-project repo. In Polylith, there's no symlinks or other quirks needed (unless you view the plugins as quirks).
Having the code organized as namespace packages - just as with single-project repos - and the individual projects including what is needed by using the packages
key in the [tool.poetry]
section (using the from
attribute). Each "project" (i.e. the artifact to build and deploy) has its own pyproject.toml file. There's documentation about Polylith for Python here.
To make this work in a Poetry context, there is the MultiProject plugin, as mentioned above by @tnielens. That plugin makes it possible to use relative includes (in the from
attribute) during the development. For deplyment, you will build proper PEP-valid wheels. This is done by using the custom build-project
command that comes with the plugin.
Having something to visualize the code in a Monorepo is probably helpful, and that is where the tooling support for Polylith comes in. There's several commands to visualize, calculate diffs, synchronize projects and create Python code according to the Polylith Architecture. The tool is, of course, Open Source 😄 I hope this helps!
@adriangb what is your solution for replacing path dependencies with regular ones when publishing?
I don’t have a solution because it’s not a used case I’ve had. I imagine a plug-in could do something similar to the scripts I’ve seen.
There's a thing called Polylith that has a different take on the problems of monorepos and sharing code, than the suggested solutions in this thread. But I think that it could be interesting to share this approach for you here.
Third-party dependencies are a thing of its own, but the code that we have control over in the projects is different. It can be shared across projects in quite a simple way by using a Monorepo with a developer experience similar to a single-project repo. In Polylith, there's no symlinks or other quirks needed (unless you view the plugins as quirks).
Having the code organized as namespace packages - just as with single-project repos - and the individual projects including what is needed by using the
packages
key in the[tool.poetry]
section (using thefrom
attribute). Each "project" (i.e. the artifact to build and deploy) has its own pyproject.toml file. There's documentation about Polylith for Python here.To make this work in a Poetry context, there is the MultiProject plugin, as mentioned above by @tnielens. That plugin makes it possible to use relative includes (in the
from
attribute) during the development. For deplyment, you will build proper PEP-valid wheels. This is done by using the custombuild-project
command that comes with the plugin.Having something to visualize the code in a Monorepo is probably helpful, and that is where the tooling support for Polylith comes in. There's several commands to visualize, calculate diffs, synchronize projects and create Python code according to the Polylith Architecture. The tool is, of course, Open Source 😄 I hope this helps!
I second David's plug for the amazing polylith plugin. We have successfully incorporated polylith to structure our repo and couldn't do without it. We are a ML shop and have many teams working on different problems but sharing a common code base. The builds are streamlined to only contain what each project needs so deployments are very thin and rarely cause issues. Highly recommended!
I'm not quite sure why everything is complicated here. This problem has been solved long time ago at the framework level in languages such as C# with nuget management that can consume the local packages and also download from package source in production (Wheel and source combination). Also frameworks such as NX in javascript are doing this for years with no problem so I don't know what is so complicated here that takes ages from poetry to do the same. The fact that is community driven should make it faster and not slower. It is now more than 3 years that is an open issue.
As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄
As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄
I would absolutely love for @moattarwork to step up and put some much-needed sweat toward solving this problem for me
Previously, I worked around it using some bash scripts to get named dependencies in the built artifacts, which I show in this demo repo and this blogpost As mentioned above, Poetry is open source, so why not contribute more usefully to it?
Therefore, I've developed a Poetry Plugin for Mono Repo dependencies at https://github.com/gerbenoostra/poetry-plugin-mono-repo-deps/
In short, it modifies the poetry build
to replace path dependencies with named dependencies in the final built wheel's metadata. Perhaps it suits (some) of your needs @moattarwork
@gerbenoostra this is awesome, I will try this for my work!
@gerbenoostra Like the idea of using a plugin, thanks ! However, i couldn't make it work and there are no errors :/. Do you have any examples/documentation ? I couldn't make it work even on a simple example like yours in README.
── pyproject.toml
└── repo
├── A
│ └── pyproject.toml
└── B
├── poetry.lock
├── pyproject.toml
└── reqs.txt
@soufea thanks for trying out! I'd love to help, but perhaps best to continue our discussion on the plugin's issue tracker? If you have an example repo online, I can check.
As a preliminary response, an example project is in the test fixtures folder.
Note that the plugin works on the project you're running it in.
/
, poetry only knows about the root pyproject.toml
./repo/A
, it will only modify the build of A's dist/repo/B
, it will only modify the build of B's distIdeally I'd also create a plugin which implements the composition aspect of mono repo's, that when you run poetry build
in /
, it will apply it to each contained project (repo/A
& /repo/B
), but didn't get to that (yet).
As you are writing, Poetry is Open Source. Maybe you are the one that should solve this long-running issue @moattarwork?😄
I would absolutely love for @moattarwork to step up and put some much-needed sweat toward solving this problem for me
Sorry guys. I was away for sometime but I'm happy if I can work on this. I'm not familiar with the code base but comparing this with the closest eco-system (NodeJs), I think the complexity lies in the toml file being a solid project file. Looking at project files in typescript, they can be inherited from each other (Or the dependency is working with locally linked project files) so they can easily inherit the the base configuration.
Although the project file contains information such as name, version, author, etc that uniquely identifying the project itself, however the dependencies and dev-dependencies could be inherited.
Solving this problem would be the first step toward implementation of a good mono-repo and the rest can be managed with standard templating.
As mentioned above, Poetry is open source, so why not contribute more usefully to it?
Thanks for sharing this. I will take a look
I've been testing out some monorepo approaches and started with @adriangb's approach here. DX is the main issue -- challenges with this approach surround
poetry run
from a subproject - as mentioned here.
poetry.lock
and.venv
seem to be the the main painpoints here -- and the workarounds mentioned involve keepingpoetry.lock
in sync (or .gitignored). Custom scripts have been implemented in a variety of solutions including @gerbenoostra's here to accommodate the extra lockfiles.It would be nice to:
- Support
poetry run
(orpoetry subproject run
if defined as a plugin) within a subproject, without requiring a lockfile or venv in the subproject- Support
poetry add
(orpoetry subproject add
) within a subproject, without requiring a lockfile or venv within the subproject -- this command would update the parent lockfile- Support
poetry remove
(similar to add)Ideally, poetry or the plugin could find the root lockfile/pyproject.toml, or there could be some way that the developer specifies it. This would lead to a similar experience to cargo, yarn, and npm.
I took the liberty to create a plugin that supports these commands and it allows users to have a single lockfile + shared venv. Hope this helps! You can find it here: https://github.com/ag14774/poetry-monoranger-plugin
Background & Rationale
This request is inspired by RPM Package Manger’s capability to build subpackages from the same Spec File.
Here, I want to propose and discuss replication a version of this capability can be replicated within poetry to allow for simplified user experience for a python project maintainer, especially when either maintaining namespace packages and/or multi-project source trees. While strict project separation is a good thing in most cases, it might not always be the more pragmatic scenario for package maintainers.
For our purposes here, we can refer to each of theses packages as a subproject. And all subprojects are managed under a single poetry project. This means that there is only a single
pyproject.toml
file and a shared project root directory with either a shared source tree or independent source trees (subdirectory) for each subproject.Description
Let us consider the scenario of multiple namespace packages being maintained in a single repository with the following structure.
Note that this will still apply even if different source directories exists within the root directory for each subproject.
Here the intention could be that we want to distribute 3 packages, namely,
namespace-package-one
,namespace-package-two
andnamespace-package-three
.For the purpose of this example, let us assume that
namespace-package-three
depends onnamespace-package-one
. Thepyproject.toml
file could look something like this.New sections are annotated with comments detailing them and expected behaviour.
Under this scenario, the following might be what the cli commands look like. Current behaviour will remain unaltered as these are additive changes.
Variations
The above is an initial though of how it might work. That said there are variations to this that should be discussed.
Does a per-package
dev-dependnecy
section make sense? This only really makes sense if we want to allow for developing a single package at a time. However, this will become tricky in cases like here where "three" depends on "one". This will mean that when developing "three", dev dependencies for "one" should also be installed. If isolation is required, then multiple virtual environments will be required, which might be overkill for majority use cases for this feature.Will all packages be installed under PEP-0517? Is it even possible to install only specific package when being installed under PEP-0517? One possible solution might be to make use of "extras" here as a way of specifying which package if any to install, but default to all.
Extensions
namespace-package
(let's call this the "project package" for now) that installs the core dependencies and also allow for "extras" as we do today without requiring the distribution of the entire source tree with the binary distribution.This means that if someone does
pip install namespace-package
, the maintainer might expect the the following to be installed:namepace.package
.namespace-package-one
andnamespace-package-three
, which are required for the "default" install.An end-user can also install the remaining package, like so -
pip install namespace-package[two]
which simply will install a dependencynamespace-package-two
.This behaviour might not be desired in all cases, and can be considered opt-in.