Closed mattoberle closed 5 months ago
Just a quick note:
pip==23.3.0
will attempt to look up the canonical extra postgresql-psycopg2binary
before falling back to postgresql_psycopg2binary
.
The simple answer to this issue may just be that the sqlalchemy
extras should list the canonical variants in their package.
Oof, in my local test, it comes out as
sqlalchemy[postgresql-psycopg2binary,postgresql_psycopg2binary]==2.0.22
Using the legacy resolver, it's just
sqlalchemy[postgresql_psycopg2binary]==2.0.22
I tried adding a simple
if ireq.extras:
ireq.extras = set(map(canonicalize_name, ireq.extras))
to the start of format_requirement
, but it seems that the str
function on a req/ireq still has the old, repeated extra names somewhere....
OK this may be sloppy, and I can never remember which ireq/req attributes are guaranteed, or how req affects ireq exactly, but the following addition at the top of format_requirement
does seem to iron this out:
if ireq.req and ireq.req.extras:
ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
if ireq.extras:
ireq.extras = set(map(canonicalize_name, ireq.extras))
I also don't know if the format method is the proper place to do this canonicalization.
@AndydeCleyre I think that the normalization should happen whenever an instance of InstallRequirement
or InstallationCandidate
is first encountered/constructed. By doing so, all other parts of the depresolver would work with the normalized stuff...
So another place we can get at this is in pip_compat.parse_requirements
:
diff --git a/piptools/_compat/pip_compat.py b/piptools/_compat/pip_compat.py
index 6409fbc..6972f03 100644
--- a/piptools/_compat/pip_compat.py
+++ b/piptools/_compat/pip_compat.py
@@ -14,6 +14,7 @@ from pip._internal.network.session import PipSession
from pip._internal.req import InstallRequirement
from pip._internal.req import parse_requirements as _parse_requirements
from pip._internal.req.constructors import install_req_from_parsed_requirement
+from pip._vendor.packaging.utils import canonicalize_name
from pip._vendor.packaging.version import parse as parse_version
from pip._vendor.pkg_resources import Requirement
@@ -74,7 +75,10 @@ def parse_requirements(
for parsed_req in _parse_requirements(
filename, session, finder=finder, options=options, constraint=constraint
):
- yield install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+ ireq = install_req_from_parsed_requirement(parsed_req, isolated=isolated)
+ if ireq.req.extras:
+ ireq.req.extras = set(map(canonicalize_name, ireq.req.extras))
+ yield ireq
def create_wheel_cache(cache_dir: str, format_control: str | None = None) -> WheelCache:
But things I don't know include:
ireq.extras
as well here?sqlalchemy[postgresql_psycopg2binary]
is a dep of something else?install_req_from_*
functions?Some extras are getting duplicated. It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x. This is pip 23.3.1 and pip-tools 7.3.0:
$ cat test.in
sentry_sdk[pure_eval]
$ pip-compile --output-file test.txt test.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --allow-unsafe --output-file=test.txt test.in
#
asttokens==2.4.1
# via sentry-sdk
certifi==2023.7.22
# via sentry-sdk
executing==2.0.0
# via sentry-sdk
pure-eval==0.2.2
# via sentry-sdk
sentry-sdk[pure-eval,pure_eval]==1.32.0
# via -r test.in
six==1.16.0
# via asttokens
urllib3==2.0.7
# via sentry-sdk
- do we need to manipulate the
ireq.extras
as well here?
Maybe? Given it's not possible at some earlier stage...
- will this fix the issue when
sqlalchemy[postgresql_psycopg2binary]
is a dep of something else?
Why not? Wouldn't “something else” also go through normalization?
- do we need to do similar adjustments in all the other places we use
install_req_from_*
functions?
Probably. I'd expect us to be consistent across the codebase.
- should we converge on the canonicalized extras names, even though this report expresses an expectation for the non-canonicalized form used by the package (sqlalchemy)?
I'd say yes. Canonicalization is well-defined now and as the time passes, more people will use newer pip versions that do this. At some point, we'll EOL support for pip versions that don't do such conversion and then, we'd just differ for no good reason. Maybe, we'd need to leave the original dep names in the comments, though.
Maybe we should shadow all the install_req_from_*
functions, or pursue getting the upstream functions to canonicalize, in pip?
Does pip not normalize in all places? I think it may make sense there too.
Does pip not normalize in all places? I think it may make sense there too.
Yeah we're getting non-normalized extras via pip._internal.req.constructors.install_req_from_parsed_requirement
, pip._internal.req.constructors.parse_req_from_line
, and the main InstallRequirement
constructor, at least.
It doesn't appear to impact the resolver or pip's use of the requirements file but it is an unexpected change when updating to pip 23.3.x.
I just wanted to share that in my experience, the duplicate extras DO impact the resolution, although it's possible that I'm misattributing the behavior. Some clarifications would help me understand if I should report an issue with pip
instead.
In my case, having pyhive[hive_pure_sasl,presto]==0.7.0
in my requirements.in
file using pip==23.3.1
and pip-tools==7.3.0
with --resolver=legacy
results in the hive_pure_sasl
requirements thrift-sasl
and pure-sasl
being removed/omitted from the resulting lockfile, but including the mash-up of pyhive[hive-pure-sasl,hive_pure_sasl,presto]==0.7.0
.
It's odd that the expected extra
dependencies ARE included when using --resolver=backtracking
.
One additional tidbit: To work around this, we are explicitly listing thrift-sasl
in requirements.in
, ~which surprisingly results in pip-compile
correctly including a via pyhive
comment. This is despite the fact that without explicitly listing thrift-sasl
as a direct dependency, pip-compile
would have removed its entry entirely.~
~Do the via
comments use a different mechanism to determine which packages are required by which than actual dependency resolution does?~
EDIT: Disregard the last. I was using the backtracking
resolver at the time, which correctly resolves the dependencies, so naturally the comments were correct.
One other thing that seems related to this:
It's odd, and I think unexpected, that a package listed in the via
comments of itself when any extras are provided, a version constraint is present, and the backtracking
resolver is used. For example, when a requirement is listed as sqlalchemy[postgresql-psycopg2binary]==2.0.23
, via sqlalchemy
will show up alongside the source requirements file in the comments. When using the legacy
resolver, packages are not listed in the via
comments of their own package specifier.
I'm also happy to make that its own issue, if you feel it is distinct enough. Thanks!
EDIT: Added the extra criteria that a version constraint must be used in order to provoke the unexpected behavior.
@tboddyspargo I don't think I can reproduce the self-via issue.
$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
# via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
# via -r -
typing-extensions==4.8.0
# via sqlalchemy
And with my PR #2013:
$ pip-compile --version
pip-compile, version 7.3.1.dev65
$ <<<'sqlalchemy[postgresql-psycopg2binary]' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
# via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
# via -r -
typing-extensions==4.8.0
# via sqlalchemy
Thanks for pointing that out, @AndydeCleyre! Your counter examples helped me realize that there's another contributing factor. It seems that the redundant via
output may only happen if there's a constraint in the original requirement specification.
$ pip-compile --version
pip-compile, version 7.3.0
$ <<<'sqlalchemy[postgresql-psycopg2binary]==2.0.23' pip-compile --no-header - -o - 2>/dev/null
greenlet==3.0.1
# via sqlalchemy
sqlalchemy[postgresql-psycopg2binary]==2.0.23
# via
# -r -
# sqlalchemy
typing-extensions==4.8.0
# via sqlalchemy
Toggling the presence of ==2.0.23
(as well as other styles of constraints) seems to also toggle the presence of the extra sqlalchemy
in the via
area of the sqlalchemy
lines. However, if you use --resolver=legacy
the redundant via
entry never seems to appear.
Thanks! I can now reproduce using the main branch. I can't confidently say that's fixed by the mentioned PR, because I didn't set out to fix that problem, but I am not seeing it there.
Apologies if this is implied by the ticket elsewhere, but pip==24.0 and pip-tools==7.4.0 emits a warning I don't think I saw before:
WARNING: sentry-sdk 1.40.5 does not provide the extra 'pure-eval'
The name of the extra in the package and in my requirements.in is pure_eval
with an underscore.
@dgilmanAIDENTIFIED
I can't reproduce that here on Linux with Python 3.11, with the versions you specified.
Can you test if it still happens with #2013 ?
Testing it out with that branch I don't get the warning. My output is:
root@3d945b95eb1f:/# pip-compile --output-file foo.txt foo.in --upgrade --allow-unsafe
WARNING: --strip-extras is becoming the default in version 8.0.0. To silence this warning, either use --strip-extras to opt into the new default or use --no-strip-extras to retain the existing behavior.
#
# This file is autogenerated by pip-compile with Python 3.11
# by the following command:
#
# pip-compile --allow-unsafe --output-file=foo.txt foo.in
#
certifi==2024.2.2
# via sentry-sdk
sentry-sdk[pure-eval]==1.40.5
# via -r foo.in
urllib3==2.2.1
# via sentry-sdk
Note that the underscore in pure_eval is replaced by a dash. I'd also expect the packages pure_eval
, asttokens
and executing
to be pinned in the file but they aren't in there at all.
Note that the underscore in pure_eval is replaced by a dash.
This is currently intentional.
I'd also expect the packages pure_eval, asttokens and executing to be pinned in the file but they aren't in there at all.
Uh oh. Thanks!
Using
pip==23.3.x
can result in invalidrequirements.txt
for projects that declare extras. A change inpip==23.3.0
introduced the "canonicalization" of extras names:The canonicalization leads to this outcome:
requirements.in
:sqlalchemy[postgresql_psycopg2binary]
requirements.txt
:sqlalchemy[postgresql-psycopg2binary]
For many cases, this probably isn't an issue.
psycopg2-binary
still makes it intorequirements.txt
as an individual entry.For more "niche" cases, the extras still carry some significance. eg. When using rules_python under the Bazel build system, the extras must properly resolve to build out a dependency graph.
Environment Versions
3.10.9
23.3.0
7.3.0
Steps to replicate
Create a simple
requirements.in
file.Create a virtual environment, install
pip-tools
and the previous majorpip
version, runpip-compile
.Observe that the resulting
requirement.txt
file listssqlalchemy[postgresql_psycopg2binary]
(with an underscore).On the same virtual environment, upgrade to
pip>=23.3
, runpip-compile
.requirement.txt
file listssqlalchemy[postgresql-psycopg2binary]
(with a hyphen).Expected result
The valid input extras should still be valid in the output.
Actual result
The output extras will not resolve.