pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

pip check does not check for missing extras requires #4086

Open anntzer opened 7 years ago

anntzer commented 7 years ago

Description:

pip check does not check for missing extras requires.

What I've run:

Install scikit-image (skipping dependencies) and dask. scikit-image depends on dask[array] but pip check does not detect a mssing dependency.

See also #3797.

vphilippon commented 7 years ago

I noted where that probably comes from: https://github.com/pypa/pip/blob/master/pip/operations/check.py#L28

If we have a requirement like requests[security], we only keep the project_name, which is requests. Although, we can't just change to requests[security], as we have no info on which extra was provided during the install (i.e. pip install requests or pip install requests[security], we don't know).

What could be done is probably to detect that the dependency is actually requests[security], and check that requests is there, and also check for the dependency required by[security].

Note: this problem did not occur with other environment markers, as the values where provided by the environment (things like python_version < "2.7" or sys_platform=="win32") and the requirements where returned by dist.requires(), and then validated.

uranusjr commented 3 years ago

I think we should do something about this during the new resolver transition, since check is a very useful tool for people to identify inconsistencies in their environments, to prevent them from generating requirements files that are not installable, or (worse) trigger subsequent backtracking with the new resolver.

The idea I’m playing with in my mind is a file in the .dist-info directory that lists PEP 508 requirements that caused this package to be present in the environment. pip check would check the content of this file, and gather dependency information accordingly. pip install --upgrade would append to this file, and show a warning if the newly-installed version is incompatible with any of the already-listed requirement. In the long run, --upgrade can also start rejecting incompatible installations when the user specifies a new --upgrade-strategy.

pradyunsg commented 3 years ago

I'm a little burnt in the "get a file standardised into dist-info", but hey, I do like the sound of what you said, and I think we should do it!

sbidoul commented 3 years ago

@uranusjr it might not be necessary to record additional information for the purpose of resolving this issue. Since the metadata of installed packages includes the dependencies (including the required extras), pip could take extras into account when transitively checking if all dependencies are satisfied.

For instance I do something similar when showing the dependency tree in pip-deepfreeze. In this example (a project that depends on firebase-admin, but where grpcio is not installed) it detects the missing dependency, while pip check does not:

image

pradyunsg commented 3 years ago

This still doesn't cover users doing pip install airclow[all] or whatever -- when the user-provided thing has the extras. :)

sbidoul commented 3 years ago

@pradyunsg yeah, that's for top level requirements. For that my proposal is still to extend REQUESTED to record what the user... requested (including extras and specifiers).

uranusjr commented 3 years ago

Indeed recording user-supplied extras in REQUESTED and actively consider extras when checking dependencies would also work for check. But I’m afraid the process might be too slow to implement for install if a package to upgrade is deep in the dependency graph, and felt this is a good chance to introduce some kind of de-normalised dependency information. What is the performance you get in pip-deepfreeze? That should be a good indication to what we should expect without de-normalising dependency information.

sbidoul commented 3 years ago

@uranusjr For install, pip-deepfreeze uses pip. But it does build the dependency graph to decide what to uninstall and we have no perf issues with projects with hundreds of requirements. I'd guess the complexity for that is O(n log(n))? pip-deepfreeze does not need REQUESTED because it has one top level requirement (the PEP 517 project), but it would not be very different with several top level requirements found in REQUESTED.

If the database of installed distribution (.dist-info) had de-normalized information, the complexity would move towards maintaining it with updates, and my impression is that it would be costlier for update operations, and harder to implement.

uranusjr commented 3 years ago

But it does build the dependency graph to decide what to uninstall and we have no perf issues with projects with hundreds of requirements. I'd guess the complexity for that is O(n log(n))?

I would guess the dominant parameter would be th depth of dependencies, not total number of packages. But hundreds of requirements should provide a realistic load, so I’m happy to go with your observation.

What’s the next step? Should we open a discussion on Discourse to propose extending REQUESTED?