jazzband / pip-tools

A set of tools to keep your pinned Python dependencies fresh.
https://pip-tools.rtfd.io
BSD 3-Clause "New" or "Revised" License
7.74k stars 612 forks source link

pip==23.3 breaking "extras" behavior #2004

Closed mattoberle closed 5 months ago

mattoberle commented 1 year ago

Using pip==23.3.x can result in invalid requirements.txt for projects that declare extras. A change in pip==23.3.0 introduced the "canonicalization" of extras names:

The canonicalization leads to this outcome:

For many cases, this probably isn't an issue. psycopg2-binary still makes it into requirements.txt as an individual entry.

For more "niche" cases, the extras still carry some significance. eg. When using rules_python under the Bazel build system, the extras must properly resolve to build out a dependency graph.

Environment Versions

  1. OS Type: Linux
  2. Python version: 3.10.9
  3. pip version: 23.3.0
  4. pip-tools version: 7.3.0

Steps to replicate

  1. Create a simple requirements.in file.

    sqlalchemy[postgresql_psycopg2binary]==1.4.47
  2. Create a virtual environment, install pip-tools and the previous major pip version, run pip-compile.

python -m venv .venv
. ./.venv/bin/activate
pip install pip-tools==7.3.0 pip==22.3.1
pip-compile --no-strip-extras
  1. Observe that the resulting requirement.txt file lists sqlalchemy[postgresql_psycopg2binary] (with an underscore).

    greenlet==3.0.0
    # via sqlalchemy
    psycopg2-binary==2.9.9
    # via sqlalchemy
    sqlalchemy[postgresql_psycopg2binary]==1.4.47
    # via -r requirements.in
  2. On the same virtual environment, upgrade to pip>=23.3, run pip-compile.

pip install pip==23.3.0
pip-compile --no-strip-extras
  1. Observe that the resulting requirement.txt file lists sqlalchemy[postgresql-psycopg2binary] (with a hyphen).
  WARNING: sqlalchemy 1.4.47 does not provide the extra 'postgresql-psycopg2binary'

greenlet==3.0.0
    # via sqlalchemy
psycopg2-binary==2.9.9
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==1.4.47
    # via
    #   -r requirements.in
    #   sqlalchemy

Expected result

The valid input extras should still be valid in the output.

Actual result

The output extras will not resolve.

mattoberle commented 1 year ago

Just a quick note:

pip==23.3.0 will attempt to look up the canonical extra postgresql-psycopg2binary before falling back to postgresql_psycopg2binary.

The simple answer to this issue may just be that the sqlalchemy extras should list the canonical variants in their package.

AndydeCleyre commented 1 year ago

Oof, in my local test, it comes out as

sqlalchemy[postgresql-psycopg2binary,postgresql_psycopg2binary]==2.0.22

Using the legacy resolver, it's just

sqlalchemy[postgresql_psycopg2binary]==2.0.22

I tried adding a simple

if ireq.extras:
    ireq.extras = set(map(canonicalize_name, ireq.extras))

to the start of format_requirement, but it seems that the str function on a req/ireq still has the old, repeated extra names somewhere....

AndydeCleyre commented 1 year ago

OK this may be sloppy, and I can never remember which ireq/req attributes are guaranteed, or how req affects ireq exactly, but the following addition at the top of format_requirement does seem to iron this out:

    if ireq.req and ireq.req.extras:
        ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
    if ireq.extras:
        ireq.extras = set(map(canonicalize_name, ireq.extras))

I also don't know if the format method is the proper place to do this canonicalization.

webknjaz commented 1 year ago

@AndydeCleyre I think that the normalization should happen whenever an instance of InstallRequirement or InstallationCandidate is first encountered/constructed. By doing so, all other parts of the depresolver would work with the normalized stuff...

AndydeCleyre commented 1 year ago

So another place we can get at this is in pip_compat.parse_requirements:

diff --git a/piptools/_compat/pip_compat.py b/piptools/_compat/pip_compat.py
index 6409fbc..6972f03 100644
--- a/piptools/_compat/pip_compat.py
+++ b/piptools/_compat/pip_compat.py
@@ -14,6 +14,7 @@ from pip._internal.network.session import PipSession
 from pip._internal.req import InstallRequirement
 from pip._internal.req import parse_requirements as _parse_requirements
 from pip._internal.req.constructors import install_req_from_parsed_requirement
+from pip._vendor.packaging.utils import canonicalize_name
 from pip._vendor.packaging.version import parse as parse_version
 from pip._vendor.pkg_resources import Requirement

@@ -74,7 +75,10 @@ def parse_requirements(
     for parsed_req in _parse_requirements(
         filename, session, finder=finder, options=options, constraint=constraint
     ):
-        yield install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+        ireq = install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+        if ireq.req.extras:
+            ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
+        yield ireq

 def create_wheel_cache(cache_dir: str, format_control: str | None = None) -> WheelCache:

But things I don't know include:

dgilmanAIDENTIFIED commented 1 year ago

Some extras are getting duplicated. It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x. This is pip 23.3.1 and pip-tools 7.3.0:

$ cat test.in
sentry_sdk[pure_eval]

$ pip-compile --output-file test.txt test.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
#    pip-compile --allow-unsafe --output-file=test.txt test.in
#
asttokens==2.4.1
    # via sentry-sdk
certifi==2023.7.22
    # via sentry-sdk
executing==2.0.0
    # via sentry-sdk
pure-eval==0.2.2
    # via sentry-sdk
sentry-sdk[pure-eval,pure_eval]==1.32.0
    # via -r test.in
six==1.16.0
    # via asttokens
urllib3==2.0.7
    # via sentry-sdk
webknjaz commented 12 months ago
  • do we need to manipulate the ireq.extras as well here?

Maybe? Given it's not possible at some earlier stage...

  • will this fix the issue when sqlalchemy[postgresql_psycopg2binary] is a dep of something else?

Why not? Wouldn't “something else” also go through normalization?

  • do we need to do similar adjustments in all the other places we use install_req_from_* functions?

Probably. I'd expect us to be consistent across the codebase.

  • should we converge on the canonicalized extras names, even though this report expresses an expectation for the non-canonicalized form used by the package (sqlalchemy)?

I'd say yes. Canonicalization is well-defined now and as the time passes, more people will use newer pip versions that do this. At some point, we'll EOL support for pip versions that don't do such conversion and then, we'd just differ for no good reason. Maybe, we'd need to leave the original dep names in the comments, though.

AndydeCleyre commented 12 months ago

Maybe we should shadow all the install_req_from_* functions, or pursue getting the upstream functions to canonicalize, in pip?

webknjaz commented 12 months ago

Does pip not normalize in all places? I think it may make sense there too.

AndydeCleyre commented 12 months ago

Does pip not normalize in all places? I think it may make sense there too.

Yeah we're getting non-normalized extras via pip._internal.req.constructors.install_req_from_parsed_requirement, pip._internal.req.constructors.parse_req_from_line, and the main InstallRequirement constructor, at least.

tboddyspargo commented 11 months ago

It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x.

I just wanted to share that in my experience, the duplicate extras DO impact the resolution, although it's possible that I'm misattributing the behavior. Some clarifications would help me understand if I should report an issue with pip instead.

In my case, having pyhive[hive_pure_sasl,presto]==0.7.0 in my requirements.in file using pip==23.3.1 and pip-tools==7.3.0 with --resolver=legacy results in the hive_pure_sasl requirements thrift-sasl and pure-sasl being removed/omitted from the resulting lockfile, but including the mash-up of pyhive[hive-pure-sasl,hive_pure_sasl,presto]==0.7.0.

It's odd that the expected extra dependencies ARE included when using --resolver=backtracking.

One additional tidbit: To work around this, we are explicitly listing thrift-sasl in requirements.in, ~which surprisingly results in pip-compile correctly including a via pyhive comment. This is despite the fact that without explicitly listing thrift-sasl as a direct dependency, pip-compile would have removed its entry entirely.~

~Do the via comments use a different mechanism to determine which packages are required by which than actual dependency resolution does?~

EDIT: Disregard the last. I was using the backtracking resolver at the time, which correctly resolves the dependencies, so naturally the comments were correct.

tboddyspargo commented 11 months ago

One other thing that seems related to this:

It's odd, and I think unexpected, that a package listed in the via comments of itself when any extras are provided, a version constraint is present, and the backtracking resolver is used. For example, when a requirement is listed as sqlalchemy[postgresql-psycopg2binary]==2.0.23, via sqlalchemy will show up alongside the source requirements file in the comments. When using the legacy resolver, packages are not listed in the via comments of their own package specifier.

I'm also happy to make that its own issue, if you feel it is distinct enough. Thanks!

EDIT: Added the extra criteria that a version constraint must be used in order to provoke the unexpected behavior.

AndydeCleyre commented 11 months ago

@tboddyspargo I don't think I can reproduce the self-via issue.

$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via -r -
typing-extensions==4.8.0
    # via sqlalchemy

And with my PR #2013:

$ pip-compile --version
pip-compile, version 7.3.1.dev65
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via -r -
typing-extensions==4.8.0
    # via sqlalchemy
tboddyspargo commented 11 months ago

Thanks for pointing that out, @AndydeCleyre! Your counter examples helped me realize that there's another contributing factor. It seems that the redundant via output may only happen if there's a constraint in the original requirement specification.

$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]==2.0.23' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
    # via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
    # via
    #   -r -
    #   sqlalchemy
typing-extensions==4.8.0
    # via sqlalchemy

Toggling the presence of ==2.0.23 (as well as other styles of constraints) seems to also toggle the presence of the extra sqlalchemy in the via area of the sqlalchemy lines. However, if you use --resolver=legacy the redundant via entry never seems to appear.

AndydeCleyre commented 11 months ago

Thanks! I can now reproduce using the main branch. I can't confidently say that's fixed by the mentioned PR, because I didn't set out to fix that problem, but I am not seeing it there.

dgilmanAIDENTIFIED commented 8 months ago

Apologies if this is implied by the ticket elsewhere, but pip==24.0 and pip-tools==7.4.0 emits a warning I don't think I saw before:

WARNING: sentry-sdk 1.40.5 does not provide the extra 'pure-eval'

The name of the extra in the package and in my requirements.in is pure_eval with an underscore.

AndydeCleyre commented 8 months ago

@dgilmanAIDENTIFIED

I can't reproduce that here on Linux with Python 3.11, with the versions you specified.

Can you test if it still happens with #2013 ?

dgilmanAIDENTIFIED commented 8 months ago

Testing it out with that branch I don't get the warning. My output is:

root@3d945b95eb1f:/# pip-compile --output-file foo.txt foo.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
#    pip-compile --allow-unsafe --output-file=foo.txt foo.in
#
certifi==2024.2.2
    # via sentry-sdk
sentry-sdk[pure-eval]==1.40.5
    # via -r foo.in
urllib3==2.2.1
    # via sentry-sdk

Note that the underscore in pure_eval is replaced by a dash. I'd also expect the packages pure_eval, asttokens and executing to be pinned in the file but they aren't in there at all.

AndydeCleyre commented 8 months ago

Note that the underscore in pure_eval is replaced by a dash.

This is currently intentional.

I'd also expect the packages pure_eval, asttokens and executing to be pinned in the file but they aren't in there at all.

Uh oh. Thanks!