pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

New resolver: Failure when package only specified with extras is not available from index #8785

Closed wchill closed 3 years ago

wchill commented 4 years ago

What did you want to do?

pip install --use-feature=2020-resolver -r requirements.txt --no-deps pip install --use-feature=2020-resolver -r requirements.txt

requirements.txt only contains a list of packages to be installed in editable mode, with some depending on each other.

This is in a fresh miniconda environment and occurred on both 20.2.2 and master.

I ran pip using --no-deps first since in my experience, installing multiple editable mode packages with dependencies on each other fails otherwise. However, running just the normal install command directly in a fresh environment still fails with the new resolver, as below.

ERROR: Could not find a version that satisfies the requirement azureml-dataset-runtime[fuse]~=0.1.0.0 (from azureml-defaults)
ERROR: No matching distribution found for azureml-dataset-runtime[fuse]~=0.1.0.0

Output

This output is after running the 2nd pip install command, to actually install the package dependencies after installing the editable mode packages using --no-deps. Output has been slightly edited to remove full file paths.

ERROR: Cannot install azureml-dataset-runtime 0.1.0.0 (from src\azureml-dataset-runtime), -r requirements.txt (line 9), -r requirements.txt (line 16) and azureml-dataset-runtime[fuse] 0.1.0.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested azureml-dataset-runtime 0.1.0.0 (from src\azureml-dataset-runtime)
    azureml-automl-core 0.1.0.0 depends on azureml-dataset-runtime~=0.1.0.0
    azureml-train-automl-client 0.1.0.0 depends on azureml-dataset-runtime~=0.1.0.0
    azureml-dataset-runtime[fuse] 0.1.0.0 depends on azureml-dataset-runtime 0.1.0.0 (Installed)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Additional information

The requirements file looks like this (all below packages should be available from pypi as well):

-e "src\azureml-core\."
-e "src\azureml-dataset-runtime\."
-e "src\azureml-defaults\."
-e "src\azureml-telemetry\."
-e "src\azureml-opendatasets\."
-e "src\azureml-pipeline\."
-e "src\azureml-pipeline-core\."
-e "src\azureml-pipeline-steps\."
-e "src\azureml-automl-core\."
-e "src\azureml-automl-runtime\."
-e "src\azureml-interpret\."
-e "src\azureml-explain-model\."
-e "src\azureml-train-restclients-hyperdrive\."
-e "src\azureml-train-core\."
-e "src\azureml-train\."
-e "src\azureml-train-automl-client\."
-e "src\azureml-train-automl-runtime\."
uranusjr commented 4 years ago

Can you provide either setup.py for all the listed projects (or at least azureml-dataset-runtime and azureml-train-automl-client), or a reduced setup that can reproduce the same error? It is impossible to tell what is going on without any context.

wchill commented 4 years ago

Sure. The below is a trimmed down version of setup.py for azureml-dataset-runtime.

azureml-train-automl-client takes a dependency on this via this entry for install_requires: '{}~={}'.format('azureml-dataset-runtime', VERSION)

# ---------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# ---------------------------------------------------------

from setuptools import setup, find_packages
import os
import shutil

SELFVERSION = '0.1.0.0'
DATAPREP_VERSION = '>=2.0.1a,<2.1.0a'

REQUIRES = [
    'azureml-dataprep{}'.format(DATAPREP_VERSION),
    'pyarrow>=0.17.0,<2.0.0'
]

with open('README.md', 'r', encoding='utf-8') as f:
    long_description = f.read()
with open('../.inlinelicense', 'r', encoding='utf-8') as f:
    inline_license = f.read()

setup(
    name="azureml-dataset-runtime",
    version=SELFVERSION,
    description='',
    long_description=long_description,
    long_description_content_type='text/markdown',
    author='Microsoft Corp',
    license=inline_license,
    url='https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py',
    packages=find_packages(exclude=["*tests*"]),
    install_requires=REQUIRES,
    extras_require={
        'pandas': ['numpy>=1.14.0,<2.0.0', 'pandas>=0.23.4,<2.0.0'],
        'parquet': [],  # Keeping as no-op to avoid breaking scenarios where this extra is used.
        'pyspark': ['pyspark==2.3.0'],
        'fuse': ['fusepy>=3.0.1,<4.0.0'],
        'scipy': ['scipy>=1.1.0,<2.0.0']
    }
)
uranusjr commented 4 years ago

I was able to reproduce the error with the following:

# a/setup.py
from setuptools import setup
setup(name="a", version="0.1.0.0", extras_require={"z": ["c"]})
# b/setup.py
from setuptools import setup
setup(name="b", version="0.1.0.0", install_requires=["a[z]~=0.1.0.0"])
# c/setup.py
from setuptools import setup
setup(name="c", version="0.1.0.0")
$ pip install --use-feature=2020-resolver --no-deps ./a ./b
(snip, this works)

$ pip install --use-feature=2020-resolver ./a ./b
Processing file:///.../a
Processing file:///.../b
Requirement already satisfied: a[z]~=0.1.0.0 in .../a (from b==0.1.0.0) (0.1.0.0)
ERROR: Cannot install a 0.1.0.0 (from .../a) and a[z] 0.1.0.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested a 0.1.0.0 (from .../a)
    a[z] 0.1.0.0 depends on a 0.1.0.0 (Installed)

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

A few observations. First, editable-ness does not matter.

Second, the extra indirection is needed to trigger this. pip install ./a ./b would work if b depends on a directly (instead of a[z]).

Third, the error if you don’t install --no-deps first is

ERROR: Could not find a version that satisfies the requirement a[z]~=0.1.0.0 (from b)
ERROR: No matching distribution found for a[z]~=0.1.0.0

Again, the extra indirection is needed to trigger the error. pip install ./a ./b works if b depends on a directly (instead of a[z]).

So the problem is that we’re doing something wrong when handling dependencies to an extra-ed requirement.

uranusjr commented 4 years ago

OK, I think I know what’s going on here. When a package requests dependencies, the resolver pass all the currently known specifiers for the provider to find candidates with. If b requests a (no extras), the resolver would pass ['a', 'a @ ./a'] since it knows that there is an additional specifier provided by the user.

When b requests a[z], however, the resolver does not know that a @ ./a also provides a[z], and thus only passes ['a[z]'] to the provider, resulting it failing to find a match.

The provider API will need to be tweaked slightly for it to be able to “tell” the resolver that a actually provides candidates for both a and a[z]. I will update resolvelib and come back to this afterwards.


The ResolutionImpossible exception triggered when a was first installed before hand is triggered by the same cause. But instead of not being able to find any candidates, the provider now is able to find exactly one match—the previously-installed distribution—but then rejects it since the installed distribution does not satisfy the direct link requirement. How this should be fixed (or not fixed) would depend on our resolution to #5780. Factory.find_candidates() will need to be updated to relect the decision and make an AlreadyInstalledCandidate be able to satisfy a SpecifierRequirement if the URL matches.

pradyunsg commented 4 years ago

I will update resolvelib and come back to this afterwards.

@uranusjr Any updates on this? I am not 100% sure how this would work and whether we have to make changes for this issue, prior to the new-resolver-is-default release.

uranusjr commented 4 years ago

I have a couple of plans to make this work (not yet decided which is best; the plans I came up with thus far are all a bit ugly). It’s not the end of the world if the release is made without fixing this, but we should try to get a fix in if possible.

uranusjr commented 4 years ago

I came up with a way to solve this without tweaking resolvelib. (See the PR linked above for the solution that does involve changing resolvelib.)

The main idea is, when the provider gets a dependency a @ URL, it should also emit a constraint on a[x] @ URL to the resolver. This makes the resolver aware of the URL spec when dealing with a[x], but only if the user does supply a[x] somewhere as a concrete requirement. The problem then becomes making constraints flow through in the resolvelib resolver, instead of only dealt externally in pip code. The trick is to make constraints a subclass of base.Requirement, so they can be returned by get_dependencies(). When find_matches() receives only constraints and not concrete requirements, it returns a special “no-op” candidate that does not have any dependencies, satisfies anything, but ultimately installs nothing. (Regular candidates are returned when at least one concrete requirement is passed into find_matches().) This should correctly represent the logical meaning of constraints in the resolver.

The problem to this solution, however, is that the provider will need to emit all combinations of extras, even if they may never be needed in the dependency graph. The object count would grow quite quickly (factorial(len(extras) + 1) - 1), potentially slowing down resolution further more. It would be best if we could generate the constraints lazily, but I haven’t thought of a way without duplicating a backtracking stack in the provider, which is probably a very bad idea.

brainwane commented 4 years ago

Per today's meeting it looks like the next step here is for @uranusjr and @pradyunsg to have a chat about what to do next. This is something we want to get taken care of before the 20.3 release in a few weeks.

wchill commented 4 years ago

Just been wondering if there's been any updates to this, since 20.3/new resolver will definitely break us in various ways unless this is fixed.

brainwane commented 3 years ago

Per notes from a meeting this week I believe Pradyun and Tzu-ping had a private conversation about this issue -- could one of you please share those notes here in the issue so we know what needs to happen next? Thanks!

uranusjr commented 3 years ago

Sorry for the delay. The issue happens when pip sees a direct URL requirement, and then a requirement of the same name with different extras. Since a requirment with different extras represent a different set of dependencies, the resolver treats these two as different requirements, and is not able to resolve the non-URL requirement into using that URL.

The workaround to this would be to install the editable/URL requirements first, like this:

$ pip install --use-feature=2020-resolver -r requirements.txt --no-deps
$ pip install --use-feature=2020-resolver -r requirements.txt

The first --no-deps installation would skip all the resolver stuff (thus avoid the error). Once all those packages are in site-packages, subsequent install calls would work correctly.

Some additional technical context that might make more sense to those into pip internals: Since a[x] provides a different set of packages from a, and that set may be different depending on the version of a chosen, the resolver internally creates a “fake” package that depends on the dependencies specified by extra x, and a that matches the same version of itself. This makes pip aware of the correct version of a when it sees a[x]. But this does not work the other way around—if you provide a first, the resolver does not know it needs a[x] when it sees a (it can’t just pull it in, otherwise the user would get unwanted packages when they don’t want a[x] but only a), and we don’t have a good way to inform the resolver about that direct URL can also be used to satisfy a[x] later on. This is why installing the packages without dependencies would work around the issue. The on-disk .dist-info can be used to represent that a package, so we don’t need to use the URL.

Ultimately, this is one of those issues that are probably fixable, but require too much resource that are more useful spent elsewhere. I would be happy to offer advices if anyone really wants to work on this, but am not personally very motivated to do the work myself.

uranusjr commented 3 years ago

Hmm, I just re-read the top post, and it seems like the --no-deps trick is not working for OP? @pradyunsg this seems to contradict with our findings yesterday; maybe something changed that makes things work now (the get_preference thing, maybe)?

uranusjr commented 3 years ago

Damn it, I was all over the place. To sum up things again: there are actually two issues here. The first is the direct-URL-with-different-extras thing I talked about in the previous comment, which is hit if you don’t run --no-deps first. We decided to not put too much resource on it, and recommend the --no-deps workaround instead.

The error you get if you --no-deps first is not related to the extras issue, but about the resolver’s upgrade strategy regarding direct URL requirements (a variant of #8711, blocked by #5780). We should (will?) fix that.

brainwane commented 3 years ago

@pradyunsg could you dive into this before tomorrow's meeting, or before Wednesday's?

brainwane commented 3 years ago

Per today's meeting, Pradyun has decided that improving this behavior is not a blocker to the 20.3 release (in which the new resolver will be on by default); the new behavior does better than old resolver, though not as well as we want to.

delijati commented 3 years ago

Apparently the the new --use-feature=2020-resolver is already activated in 20.3? It broke docker my deployments.

Docker pipeline edits setup.py to use CI_JOB_TOKEN to clone deps:

ham/setup.py # gitlabci changes git+ssh to git+https
   -> git+https://gitlab.com/bar/foo.git@0.1
   -> git+https://gitlab.com/bar/lau.git@0.1
foo/setup.py # points still to repo via ssh
   -> git+ssh://gitlab.com/bar/lau.git@0.1

Under 20.2.4:

It clones foo and lau via https git and installs it

Under 20.3:

It clones foo and lau via https git and installs it. It also tries to install lau again via ssh

Under 20.2.4 + --use-feature=2020-resolver:

Same error as 20.3.

uranusjr commented 3 years ago

@delijati The reason to your error is not the same as described in this issue. The two lau URLs are different. Yes, the two URLs ultimately download the same code, but pip does not have any special knowledge about GitHub, and therefore (correctly) treated as different packages.

brainwane commented 3 years ago

@delijati Yes, the new resolver is activated by default in pip 20.3. Here's more info on that, including how to specify the old resolver temporarily as a workaround.

delijati commented 3 years ago

@uranusjr jup the new dependecy resolver made just transparent that there was an error all along. Thanks i hope i can finally convince the team to use an artifactory store ;) or i have to stick with --use-deprecated=legacy-resolver

uranusjr commented 3 years ago

I have merged #8939, #9143, and #9204 here, which all have the same root cause, described above (https://github.com/pypa/pip/issues/8785#issuecomment-678885871).

jaraco commented 3 years ago

I've attempted to use the workaround described in #9143 (use legacy-resolver). However, this approach has another downside. In python/importlib_metadata@c769ba8fae245265bdb3f173a41e0e8c4f2a2d4b, I implemented the workaround, but now when I run tox on my local workstation, it doesn't work at all:

importlib_metadata main $ tox
python create: /Users/jaraco/code/public/importlib_metadata/.tox/python
python develop-inst: /Users/jaraco/code/public/importlib_metadata
ERROR: invocation failed (exit code 3), logfile: /Users/jaraco/code/public/importlib_metadata/.tox/python/log/python-1.log
=============================================================== log start ===============================================================
An error occurred during configuration: option use-deprecated: invalid choice: 'legacy-resolver' (choose from )

================================================================ log end ================================================================
________________________________________________________________ summary ________________________________________________________________
ERROR:   python: InvocationError for command /Users/jaraco/code/public/importlib_metadata/.tox/python/bin/python -m pip install --exists-action w -e '/Users/jaraco/code/public/importlib_metadata[testing]' (exited with code 3)

I have the latest tox installed with the latest virtualenv:

importlib_metadata main $ which tox
/Users/jaraco/.local/bin/tox
importlib_metadata main $ ~/.local/pipx/venvs/tox/bin/python -m pip list
Package         Version
--------------- -------
appdirs         1.4.4
distlib         0.3.1
filelock        3.0.12
packaging       20.8
pip             20.3.3
pluggy          0.13.1
py              1.10.0
pyparsing       2.4.7
setuptools      51.1.2
six             1.15.0
toml            0.10.2
tox             3.21.0
tox-pip-version 0.0.7
virtualenv      20.3.0
wheel           0.36.2

Yet, somehow, pip 20.2.4 is getting used.

I tried forcing a later pip with tox-pip-version:

diff --git a/tox.ini b/tox.ini
index 11f52d7..46676f3 100644
--- a/tox.ini
+++ b/tox.ini
@@ -4,6 +4,8 @@ minversion = 3.2
 # https://github.com/jaraco/skeleton/issues/6
 tox_pip_extensions_ext_venv_update = true
 toxworkdir={env:TOX_WORK_DIR:.tox}
+requires =
+   tox-pip-version

 [testenv]
@@ -15,6 +17,8 @@ extras = testing
 setenv =
    # workaround pypa/pip#9143
    PIP_USE_DEPRECATED=legacy-resolver
+pip_version =
+   pip>=20.3.1

 [testenv:docs]

But even with that, pip fails to upgrade itself with the same error. It's proving difficult to implement the workaround in a reliable way. I somehow need a way for tox to invoke pip first without the workaround to upgrade pip, then invoke it with the workaround to install the the project and its dependencies.

I probably could spend some more time researching how tox and virtualenv work together to install different pip versions in different environments and maybe come up with a workaround, but for now what I'm doing is manually removing the workaround on my local environments when creating tox environments, then undoing that change.

I'm open to suggestions, but unblocked and done for the day, so no reply is needed. I mainly just wanted to capture this secondary issue.

yuvalmarciano commented 3 years ago

Hi @uranusjr , I can see https://github.com/pypa/pip/pull/9775 is merged and therefore I'm using pip of version 21.1. A similar error is raised when I try to install the following:

git+ssh://git@github.com/my-organization/repo1.git@1.0.0#egg=repo1[extra]
git+ssh://git@github.com/my-organization/repo2.git@2.0.0#egg=repo2

When repo2 depends on repo1 with no extras (same version):

git+ssh://git@github.com/my-organization/repo1.git@1.0.0#egg=repo1

The exception is:

ERROR: Cannot install -r requirements.txt (line 3) and repo1[extra]==1.0.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    repo2 2.0.0 depends on repo1 1.0.0 (from git+ssh://****@github.com/my-organization/repo1.git@1.0.0#egg=repo1)
    repo1[extra] 1.0.0 depends on repo1 1.0.0 (from git+ssh://****@github.com/my-organization/repo1.git@1.0.0#egg=repo1[extra])

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Could you please help me understand what I'm doing wrong?

sbidoul commented 3 years ago

@yuvalmarciano could you try rewriting your dependencies in PEP 508 format ?

repo1 @ git+ssh://git@github.com/my-organization/repo1.git
repo1[yara] @ git+ssh://git@github.com/my-organization/repo1.git
yuvalmarciano commented 3 years ago

@sbidoul Sure, thanks for the quick response. Unfortunately I'm still getting the same error.

uranusjr commented 3 years ago

Your issue is not related to this issue, and I would suggest

  1. Open a new issue (I think there is already one open on this but can’t find it, so might as well start a new conversation).
  2. Provide clear steps to reproduce your issue. It is close to impossible to know what is going on when all you provide are some private (and fake) repository URLs because you can have literally anything inside it, and those things are causing problems.
yuvalmarciano commented 3 years ago

@uranusjr I managed to reproduce and find a solution for my problem, so I guess there's no need for another issue.

If you want to take a look (and maybe make the problem clearer because it was not clear to me) - I created two public repositories on Github containing dummy python packages: https://github.com/yuvalmarciano/my-package https://github.com/yuvalmarciano/my-package-with-extra

To reproduce the problem just create a requirements.txt file containing:

my_package @ git+ssh://git@github.com/yuvalmarciano/my-package.git@master
my_package_with_extra[extra] @ git+ssh://git@github.com/yuvalmarciano/my-package-with-extra.git@master

and execute pip install -r requirements.txt with pip 21.1.

The solution is to remove the #egg=... part of my_package_with_extra dependency on my-package/setup.py. It does make sense, but removing this part was a total guess and luckily it helped. Maybe the exception should provide more details in such cases.