Closed decathorpe closed 4 years ago
Embedded/Bundled dependencies have long been a problem (JARs anyone?). Unfortunately, packaging and distributing these kinds of projects are still not straightforward (this includes golang and java packages too).
To your specific concern, doesn't pip do the same thing? In cases like poetry and pip vendoring base dependencies solves a lot of usability issues to a large portion of the end-users. So, I am not sure if it makes sense to not vendor.
In the specific context of fedora packages, I'd think poetry would be handled similar to what is done in python-pip.spec.
Well, pip is really a special case in fedora, because - if I understand correctly - it's handled this way to solve bootstrap issues when upgrading to newer python versions, that doesn't apply to poetry.
And I really want to avoid to have to de-bundle things from poetry(-core).
The issue with bundled stuff is that if one of those dependencies fixes a security vulnerability, a LTS distro needs to fix it everywhere – potentially in old versions of poetry, long after poetry has moved on and introduced incompatible changes.
pip
is quite special because it's needed by Python itself. It's an example of something that's hard to maintain in a distro, not of something every build backend should do.
Gentoo packager here. I'm packaging poetry >= 1.1.0a1
and poetry-core >= 1.0.0a5
for our humble source-based distro and can only concur with everything and everyone above – excluding @abn in his apologetic "But pip
does this bad thing, so poetry
should too!" screed. Appeal to authority is no justification for continued bad behaviour. pip
is irrelevant; this is poetry
.
As distro packagers, our core contention with bundled dependencies is exactly as described by @encukou: each bundled dependency increases attack surface. That's bad. While bundling dependencies may slightly ease the installation burden for end users installing poetry
via the non-standard get-poetry.py
installer, it does so only by substantially increasing the security risk for all users regardless of installation pathway.
When a major insecurity outweighs a minor convenience, the former takes precedence.
Oh, by the five Nordic Gods of the Underworld. I just made that up. poetry-core
doesn't merely bundle dependencies; it magically resolves, downloads, and patches their codebases at poetry-core
installation time, which is absolute cray-cray:
jsonschema
and tomlkit
) are patched unidiff-style via the declarative [tool.vendering]
subsection of the top-level pyproject.toml
file.typing
) is patched sed-style via the declarative [tool.vendering.transformations]
subsection of the top-level pyproject.toml
file.poetry-core
installation time via the tool.poetry.dependencies
subsection of a sub-level pyproject.toml
file in the vendors/
subdirectory.Why is this absolute cray-cray? Because system package managers cannot reasonably do any of the above. Doing all of the above would effectively require system package managers to perfectly parse and handle PEP 517, PEP 518, pyproject.toml
, and the ad-hoc dynamic bundling and patching required by poetry-core
. But all of that is exactly why poetry
is supposed to exist in the first place.
In short, this is not how bundled dependencies are bundled. I've never seen a codebase go through such extreme gesticulations merely to bundle dependencies. In fact, this isn't bundling at all; by compare, bundling would be sane. This is a jury-rigged non-configurable build system that has nothing whatsoever to do with any existing dependency resolution mechanism outside the dim purview of the poetry
ecosystem.
Take ZeroNet, a comparable standalone Python application with many bundled prepatched dependencies. How exactly does ZeroNet accomplish this lofty goal? By actually bundling and prepatching those dependencies within itself as standard Python subpackages rather than offloading that non-trivial burden onto external tooling.
No concessions whatsoever have been made for installation by anything other than the non-standard get-poetry.py
installer. System package managers cannot reasonably package poetry
or poetry
-based packages as is. Bundled dependencies are bad, but the current approach is an order of baditude worse.
A dramatic rethink is warranted and, in fact, required.
@leycec no one is saying "But pip does this bad thing, so poetry should too!". The point of this issue is to discuss this matter and, if reasonable, figure out an alternative that makes a distro package maintainer's life easier.
As @encukou points out, there are inherent risks of bundling requirements. But these risks need to be weighed against multiple factors, this includes usability, access vectors, exploitability etc.
I disagree with @leycec that this is a "major insecurity" for a few reasons,
As for a "minor inconvenience", it should be noted that poetry is expected to work with multiple python versions as a common use case is in environments that have mulitple python versions managed via pyenv
. This is made a bit more trickier due to the fact that end users are on various platforms. Additionally,
@leycec, as for your last comment, poetry-core
uses vendoring to keep in-tree bundled packages in sync. If you check here, you will see that the packages are in fact bundled similar to your reference project. We do not do any dynamic patching/installation at installation time.
All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py
and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.
To this end, any constructive feedback on this is always welcome.
One of the Fedora Python packagers here, hello. Few notes:
I completely understand and respect reasons why upstream projects like pip, pipetry or pipenv decide to bundle dependencies. The reasons are valid for upstreams.
As distributors, we don't like that very much and we try to avoid that for reasons other have already noted. Not only security fixes, but also other kinds of fixes are backported in distributions and maintaining and patching 8 different requests bundled in different projects is a nightmare. I understand that @abn wants to work toward understanding our reasons and respecting them -- thanks for this.
In Fedora, we keep bundled dependencies in pip out of necessity, not that we would want that. Considering that we build pip wheels and use those in venv and virtualenv in all the various Python versions we ship, debundling (devendoring) would become very complex and fragile. I can talk more about that, but it would get off topic.
A reasonable compromise would be to make unbundling supported and trivial. Currently, we would need to sed/patch each import statement when we unbundle. That makes upgrading the packaged poetry a very unpleasant and error-prone task. Can we work together to make the vendoring of the dependencies easily revertible at our build time?
Also, I'd like to cooperate here, not argue and fight. That isn't helpful.
If a system managed dependent library is updated or changed, this will impact poetry's functionality.
You're either using:
All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.
Quite a lot of the libraries poetry vendors are backports of Python's standard library, which is has quite strong backwards-compatibility guarantees. How hard would it be to use stdlib versions of those on Python versions that have them?
For the others, it seems that:
poetry.core._vendor
namespace.Is that right?
One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches. Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.
Does anyone foresee problems with this approach?
Does anyone foresee problems with this approach?
Not with unpatched dependencies, I don't really.
One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches. Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.
Does anyone foresee problems with this approach?
This sounds like a good solution that would satisfy both use cases. poetry without modifications would use the vendored dependencies, and distros could easily remove the sys.path modification and just drop the _vendor directory.
As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions. These will go away soon as we do plan on dropping 2.7 and 3.5 support in a future release.
The suggested apprach might work. However, I do need to run this by the other maintainers to make sure I have not missed something obvious, especially since I have not worked on the vendoring here. Will also need to test it out to make sure it does not break anything.
Will see if I can get this done before we get to 1.1.0.
Initial fix for this has been merged into master.
For debundling dependencies you can simply drop poetry.core._vendor
. Versions of bundled requirements are available in vendor.txt
under that directory.
We'll look into any issues that crop up as they are identified.
This looks great, thank you!
I'll report back if I'm encountering any issues in the packaging context :)
As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions.
Just a question: Are those needed to run Poetry on the older Python versions, or also to manage older Python versions projects? E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata
ever used?
@hroncok with the change I have done the following.
As for the remaining bundled dependencies; you will notice that everything except six
is not related to backwards compatibility. The importlib-metadata
package is no longer used by poetry-core
. We patch its use by jsonschema
so that we do not have have to vendor it.
Also, it is important to not that this is specific to poetry-core
. Poetry itself, does use importlib-metadata
for Python versions less than 3.8, but it will be explicitly indicated. Poetry itself does not, and will not vendor dependencies for the PyPI published package. The only case where we vendor is when Poetry is installed using the get-poetry.py
script, this is done in order for it to work seamlessly in multi-python environments.
Let me know if I can provide more information on this or if something needs to be clarified further.
Thanks!
@hroncok I realised I might not have answered your question.
E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata ever used?
Poetry, when run on 3.8 to manage a 3.6 project, it does not need importlib-metadata
package. The dependencies are for Poetry itself. Poetry, will be able to manage Python 2.7 projects even after Poetry itself drops runtime support for Python 2.7.
This was resolved via python-poetry/poetry-core#25
pip's has a published rationale for doing so: https://pip.pypa.io/en/latest/development/vendoring-policy/#rationale. As noted by others, it's uniquely positioned in this situation. :)
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Python projects are starting to rely on pyproject.toml / poetry based build systems now, and this is also affecting fedora packages, for example. In this context, it's important to have a working, "native" poetry package.
However, using vendored dependencies - like in this project (which looks like will be required by what will be poetry 1.1.0) - will make it almost impossible to package poetry, let alone package it correctly.
Please reconsider the decision to use bundled dependencies.