python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.07k stars 2.26k forks source link

poetry-core: consider *not* to vendor dependencies #2346

Closed decathorpe closed 4 years ago

decathorpe commented 4 years ago

Python projects are starting to rely on pyproject.toml / poetry based build systems now, and this is also affecting fedora packages, for example. In this context, it's important to have a working, "native" poetry package.

However, using vendored dependencies - like in this project (which looks like will be required by what will be poetry 1.1.0) - will make it almost impossible to package poetry, let alone package it correctly.

Please reconsider the decision to use bundled dependencies.

abn commented 4 years ago

Embedded/Bundled dependencies have long been a problem (JARs anyone?). Unfortunately, packaging and distributing these kinds of projects are still not straightforward (this includes golang and java packages too).

To your specific concern, doesn't pip do the same thing? In cases like poetry and pip vendoring base dependencies solves a lot of usability issues to a large portion of the end-users. So, I am not sure if it makes sense to not vendor.

In the specific context of fedora packages, I'd think poetry would be handled similar to what is done in python-pip.spec.

decathorpe commented 4 years ago

Well, pip is really a special case in fedora, because - if I understand correctly - it's handled this way to solve bootstrap issues when upgrading to newer python versions, that doesn't apply to poetry.

And I really want to avoid to have to de-bundle things from poetry(-core).

encukou commented 4 years ago

The issue with bundled stuff is that if one of those dependencies fixes a security vulnerability, a LTS distro needs to fix it everywhere – potentially in old versions of poetry, long after poetry has moved on and introduced incompatible changes.

pip is quite special because it's needed by Python itself. It's an example of something that's hard to maintain in a distro, not of something every build backend should do.

leycec commented 4 years ago

Gentoo packager here. I'm packaging poetry >= 1.1.0a1 and poetry-core >= 1.0.0a5 for our humble source-based distro and can only concur with everything and everyone above – excluding @abn in his apologetic "But pip does this bad thing, so poetry should too!" screed. Appeal to authority is no justification for continued bad behaviour. pip is irrelevant; this is poetry.

As distro packagers, our core contention with bundled dependencies is exactly as described by @encukou: each bundled dependency increases attack surface. That's bad. While bundling dependencies may slightly ease the installation burden for end users installing poetry via the non-standard get-poetry.py installer, it does so only by substantially increasing the security risk for all users regardless of installation pathway.

When a major insecurity outweighs a minor convenience, the former takes precedence.

leycec commented 4 years ago

Oh, by the five Nordic Gods of the Underworld. I just made that up. poetry-core doesn't merely bundle dependencies; it magically resolves, downloads, and patches their codebases at poetry-core installation time, which is absolute cray-cray:

Why is this absolute cray-cray? Because system package managers cannot reasonably do any of the above. Doing all of the above would effectively require system package managers to perfectly parse and handle PEP 517, PEP 518, pyproject.toml, and the ad-hoc dynamic bundling and patching required by poetry-core. But all of that is exactly why poetry is supposed to exist in the first place.

In short, this is not how bundled dependencies are bundled. I've never seen a codebase go through such extreme gesticulations merely to bundle dependencies. In fact, this isn't bundling at all; by compare, bundling would be sane. This is a jury-rigged non-configurable build system that has nothing whatsoever to do with any existing dependency resolution mechanism outside the dim purview of the poetry ecosystem.

Take ZeroNet, a comparable standalone Python application with many bundled prepatched dependencies. How exactly does ZeroNet accomplish this lofty goal? By actually bundling and prepatching those dependencies within itself as standard Python subpackages rather than offloading that non-trivial burden onto external tooling.

No concessions whatsoever have been made for installation by anything other than the non-standard get-poetry.py installer. System package managers cannot reasonably package poetry or poetry-based packages as is. Bundled dependencies are bad, but the current approach is an order of baditude worse.

A dramatic rethink is warranted and, in fact, required.

abn commented 4 years ago

@leycec no one is saying "But pip does this bad thing, so poetry should too!". The point of this issue is to discuss this matter and, if reasonable, figure out an alternative that makes a distro package maintainer's life easier.

As @encukou points out, there are inherent risks of bundling requirements. But these risks need to be weighed against multiple factors, this includes usability, access vectors, exploitability etc.

I disagree with @leycec that this is a "major insecurity" for a few reasons,

  1. poetry is a development tool
  2. currently, we strip out compiled binaries
  3. exploitation of any vulnerability or weakness in a vendored package requires the compromise of the host, trusted third-party services (PyPi, self-hosted repositories) etc.

As for a "minor inconvenience", it should be noted that poetry is expected to work with multiple python versions as a common use case is in environments that have mulitple python versions managed via pyenv. This is made a bit more trickier due to the fact that end users are on various platforms. Additionally,

  1. If a system managed dependent library is updated or changed, this will impact poetry's functionality.
  2. If (1) occurs, PEP-517 builds that depend on core will cease to work, ie. developers cannot install packages (ie. breaks a primary functionality of the tool).

@leycec, as for your last comment, poetry-core uses vendoring to keep in-tree bundled packages in sync. If you check here, you will see that the packages are in fact bundled similar to your reference project. We do not do any dynamic patching/installation at installation time.

All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.

To this end, any constructive feedback on this is always welcome.

hroncok commented 4 years ago

One of the Fedora Python packagers here, hello. Few notes:

I completely understand and respect reasons why upstream projects like pip, pipetry or pipenv decide to bundle dependencies. The reasons are valid for upstreams.

As distributors, we don't like that very much and we try to avoid that for reasons other have already noted. Not only security fixes, but also other kinds of fixes are backported in distributions and maintaining and patching 8 different requests bundled in different projects is a nightmare. I understand that @abn wants to work toward understanding our reasons and respecting them -- thanks for this.

In Fedora, we keep bundled dependencies in pip out of necessity, not that we would want that. Considering that we build pip wheels and use those in venv and virtualenv in all the various Python versions we ship, debundling (devendoring) would become very complex and fragile. I can talk more about that, but it would get off topic.

A reasonable compromise would be to make unbundling supported and trivial. Currently, we would need to sed/patch each import statement when we unbundle. That makes upgrading the packaged poetry a very unpleasant and error-prone task. Can we work together to make the vendoring of the dependencies easily revertible at our build time?

Also, I'd like to cooperate here, not argue and fight. That isn't helpful.

encukou commented 4 years ago

If a system managed dependent library is updated or changed, this will impact poetry's functionality.

You're either using:

All that said, we do also think there is room for improvement in the way we do things. This includes our installer get-poetry.py and also figuring out in what ways we can help reduce the pain of getting poetry packaged for distros without having to sacrifice usability in other scenarios.

Quite a lot of the libraries poetry vendors are backports of Python's standard library, which is has quite strong backwards-compatibility guarantees. How hard would it be to use stdlib versions of those on Python versions that have them?

For the others, it seems that:

Is that right?

One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches. Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.

Does anyone foresee problems with this approach?

hroncok commented 4 years ago

Does anyone foresee problems with this approach?

Not with unpatched dependencies, I don't really.

decathorpe commented 4 years ago

One idea is to have the tool put its _vendor directory in sys.path at startup (before other imports), so all the libraries are importable using standard names. That would remove the need for most patches. Distros could then empty the _vendor directory, and ensure that compatible versions of the packages are available elsewhere.

Does anyone foresee problems with this approach?

This sounds like a good solution that would satisfy both use cases. poetry without modifications would use the vendored dependencies, and distros could easily remove the sys.path modification and just drop the _vendor directory.

abn commented 4 years ago

As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions. These will go away soon as we do plan on dropping 2.7 and 3.5 support in a future release.

The suggested apprach might work. However, I do need to run this by the other maintainers to make sure I have not missed something obvious, especially since I have not worked on the vendoring here. Will also need to test it out to make sure it does not break anything.

Will see if I can get this done before we get to 1.1.0.

abn commented 4 years ago

Initial fix for this has been merged into master.

For debundling dependencies you can simply drop poetry.core._vendor. Versions of bundled requirements are available in vendor.txt under that directory.

We'll look into any issues that crop up as they are identified.

decathorpe commented 4 years ago

This looks great, thank you!

I'll report back if I'm encountering any issues in the packaging context :)

hroncok commented 4 years ago

As @encukou noted, quite a few of the packages vendored are done so to allow compatibility with older python versions.

Just a question: Are those needed to run Poetry on the older Python versions, or also to manage older Python versions projects? E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata ever used?

abn commented 4 years ago

@hroncok with the change I have done the following.

  1. Removed any and all unused dependencies
  2. Debundled Python 2.7 compatibility dependencies

As for the remaining bundled dependencies; you will notice that everything except six is not related to backwards compatibility. The importlib-metadata package is no longer used by poetry-core. We patch its use by jsonschema so that we do not have have to vendor it.

Also, it is important to not that this is specific to poetry-core. Poetry itself, does use importlib-metadata for Python versions less than 3.8, but it will be explicitly indicated. Poetry itself does not, and will not vendor dependencies for the PyPI published package. The only case where we vendor is when Poetry is installed using the get-poetry.py script, this is done in order for it to work seamlessly in multi-python environments.

Let me know if I can provide more information on this or if something needs to be clarified further.

hroncok commented 4 years ago

Thanks!

abn commented 4 years ago

@hroncok I realised I might not have answered your question.

E.g. if Poetry itself runs on Python 3.8 to manage a Python 3.6 project, is importlib-metadata ever used?

Poetry, when run on 3.8 to manage a 3.6 project, it does not need importlib-metadata package. The dependencies are for Poetry itself. Poetry, will be able to manage Python 2.7 projects even after Poetry itself drops runtime support for Python 2.7.

abn commented 4 years ago

This was resolved via python-poetry/poetry-core#25

pradyunsg commented 4 years ago

doesn't pip do the same thing?

pip's has a published rationale for doing so: https://pip.pypa.io/en/latest/development/vendoring-policy/#rationale. As noted by others, it's uniquely positioned in this situation. :)

github-actions[bot] commented 6 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.