pypa / pip

The Python package installer
https://pip.pypa.io/
MIT License
9.51k stars 3.02k forks source link

the problem with replacing dependency links with PEP 508 URL requirements #5898

Open benoit-pierre opened 6 years ago

benoit-pierre commented 6 years ago

My 2 cents: PEP 508 URLs are no replacement for dependency links, because of the lack of version specifiers.

For example, with dependency links, you can push a package on PyPI, with dependencies on other PyPI project, but the option to use a patched version for some of those dependencies (with some extra bug fixes) when using --process-dependency-links. I've use that myself on a project depending on PyObjc because the delay between releases is so long.

Additionally, the lack of version specifiers mean there's no way for pip to know if a existing installation is compatible or not: this is problematic when upgrading a project depending on another through a PEP 508 direct URL, but also make sharing a such a dependency between projects problematic.

And finally, dependency links are opt-in, and usable on PyPI. But PEP 508 URLs are forbidden by pip during install for projects originating from PyPI: for "security reasons". This, to me, does not really make sense: it's not like installing from PyPI only is secure!

That last point could be addressed by changing the behaviour in pip (maybe a --allow-url=regexp option?), but I don't see a way around the lack of version specifiers. Could the PEP be amended to allow package >= 10.0 @ url?

cjerdonek commented 6 years ago

Adding the deprecation label because it relates to a pending deprecation / whether something should be deprecated.

pradyunsg commented 6 years ago

Could the PEP be amended to allow package >= 10.0 @ url?

I think so, given that they're supposed to be the intended replacement for dependency links. Perhaps a discussion over at distutils-sig is needed for that.

pfmoore commented 5 years ago

I agree, this should be raised on distutils-sig. However, I'd prefer it if the discussion were framed as "how to make URL links a complete replacement for dependency links" rather than just suggesting this single change. Ideally, this is the only change needed (as someone who doesn't use dependency links, I can't really comment on that) but I think the key here is to get community agreement that URL links are an acceptable replacement, so that we can finally retire dependency links without needing to worry about the possibility that someone pops up with a use case we hadn't considered. (Of course, that may still happen, but it's easier to point to a distutils-sig discussion that was everyone's chance to speak up, than to simply say "we didn't think of that" :-))

cam72cam commented 5 years ago

I've been using @ URL links (with @benoit-pierre's patch from https://github.com/pypa/pip/issues/5780#issuecomment-421092322) with a lot of success.

We simply specify the exact build that is needed in the URL which seems to work for us. I can see however that the ability to do version matching on that string would be helpful.

xavfernandez commented 5 years ago

FWIW, from trying to use PEP 508 URL, the lack of version specifiers was crippling and I'd also be in favor of adding them.

stinovlas commented 5 years ago

What's the status of this issue? I was really surprised that dependency links were removed without addressing this issue first. Version specifiers are really important. Any news?

stinovlas commented 5 years ago

So, I started this thread on distutils-sig. I raised this issue and discussed with the developers. In the end, I was convinced that changing PEP 508 to include version specifiers is both impractical and unnecessary. I'd like to explain while I reached this conclusion to other pip users:

desertkun commented 5 years ago

I have a setup like this that is now broken:

When I install package first like this: pip install --extra-index-url=http://local-simple-index/simple first>=0.1dev it tries to install it, and proceeds to install dependency second right from pypi ignoring dependency_links of the setup.py.

I know there's PEP508. How do I tell pip to get the second from http://cdn.somerepo.com/simple without dependency_links (rest in pieces)? Setting up ~/.pip/pip.conf is not an option, due to multiple reasons.

desertkun commented 5 years ago

Excuse me my rant, apperently pip.conf is a solution.

aldencolerain commented 5 years ago

[Edit: Just read the whole conversation]

@stinovlas Just trying to understand and reach your same conclusion. What did you mean by "private packages that depend on each other in not-trivial way" are you just referring to latest/versioned url support?

stinovlas commented 5 years ago

@stinovlas Just curious what you meant by "private packages that depend on each other in not-trivial way"? Does "git+ssh://git@github.com/..." not work under install_requires?

It does work. But, you can only depend on one specific commit-ish (i.e. a branch). You can't say "I want version >= 3.2 from this repository.".

pdxjohnny commented 5 years ago

Here's a workaround for anyone interested (yes it's ugly)

setup.py

import sys
import subprocess

git_rpmfile = 'git+https://github.com/pdxjohnny/rpmfile@subfile_close'

try:
    import rpmfile
    # ... do some version checking ...
except (ModuleNotFoundError, ImportError):
    if '--user' in sys.argv:
        subprocess.run([sys.executable, '-m', 'pip', 'install', '--upgrade',
            '--user', git_rpmfile], check=False)
    else:
        subprocess.run([sys.executable, '-m', 'pip', 'install', '--upgrade',
            git_rpmfile], check=False)

setup(
    name='

Another example:

https://github.com/odwdinc/SSBU_Amiibo/blob/0ffe836f61fb91e3fb878a92943720dd86edf932/setup.py#L16

os.system('pip install --user git+https://github.com/odwdinc/pyamiibo@master')

To add some use cases for this feature.

stefansjs commented 4 years ago

So it's been mentioned that you should just run your own index server if you need a dependency URL. That's fine and dandy, except I literally can't get pip to look in my index server if I specify a dependency from it in setup.py. Can somebody explain how you're supposed to specify an index server in setup.py without dependency_links, and without pip respecting my environment while building a wheel?

uranusjr commented 4 years ago

@stefansjs You do not specify an index server in setup.py. Instead, the user chooses what index to install from when they install you package.

stefansjs commented 4 years ago

Forgive my ignorance here. How do I then build my wheel to publish in my private index server if it has dependencies in my private index server?

It seems like not respecting index-url at build-time and not respecting dependency_links at build-time pretty much eliminates hosting my own index server. What am I missing about building my own packages to host on my own index server?

apolotsk commented 4 years ago

I'd like to cover one of the use cases.

Suppose the project is developed in-house and consists of several python packages. The "main" python package depends on the others. All packages are stored in the private repos.

During the development, the main package is installed via pip install git+https://company.local/repos/main-package.git. This command also installs other (private) packages, whose VCS specifiers are set in main-package/setup.py. This solution does not require the Python Package Index nor requirements.txt, and is easy to setup and maintain.

Once the development is over, the project is achieved (via pip download -d ./downloads git+https://company.local/repos/main.git, which downloads the dependencies also) for possible future offline install and private repos are deleted. Having a simple way to specify a link for dependencies would make it possible to support offline install with command pip install --no-index --find-links=./downloads main-package effortlessly.

Currently, the last command tries to access URLs in VCS specifiers, because --find-links does not support them. Alternative solution setup(dependency-links=...) is deprecated. The only working solution I know is pip install --no-index --find-links=./downloads -r requirements.txt, with requirements.txt containing package names (no URLs).

pradyunsg commented 4 years ago

Everyone, before commenting further here, please read this thread on distutils-sig.

@pypa/pip-committers Given that this has been discussed (and that discussion summarized here: https://github.com/pypa/pip/issues/5898#issuecomment-459276514), and the concensus was to not change anything beyond status quo; is there anything actionable here?

kousu commented 4 years ago

I have another use case that PEP 508 does not cover well: finding wheels.

pytorch's pypi version is very large: it includes all the fancy GPU code needed for training neural network6s, and that, compiled, is about 600MB. That's way too much to ask our users to install, and it's too much even to ask CI to install. And we definitely don't want to be having users compiling torch from scratch.

To address the elephant in the room, pytorch has a solution: they provide torch+cpu variant packages, but they haven't put them up on pypi, instead they're on https://download.pytorch.org/whl/cpu/torch_stable.html.

To get them, then, we put this in a requirements.txt:

https://github.com/neuropoly/spinalcordtoolbox/blob/b64cad3c846fd6bd7a557688b67b80fe0b2c6dc2/requirements.txt#L26-L30

I want to migrate towards having everything in setup.py, because I want to eventually be able to package and release on pypi ourselves. But dependency_links is gone, and I can't replace these with PEP 508 URLs because those are too specific; these are compiled extension modules so the platform version matters.

setup(
  ...
  install_requires=[
    ...
    "torch@https://download.pytorch.org/whl/cpu/torch-1.6.0%2Bcpu-cp36-cp36m-linux_x86_64.whl",
    ...
  ]

makes no sense. It pins my package to linux.

I wish I could tell setup() extra_find_url="https://download.pytorch.org/whl/cpu/torch_stable.html". Or maybe torch@https://download.pytorch.org/whl/cpu/torch_stable.html.

Is the answer going to be that packages on pypi have to only depend on other packages on pypi? I guess that's a reasonable answer but pragmatically I've gotta deal with non-pypi sources and I wish I could keep all my dependency metadata in one place.

wild workaround Maybe I could use more explicit https://www.python.org/dev/peps/pep-0508/#environment-markers, but that gets super verbose super quickly, and pip already has this logic built in: ``` torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp36-cp36m-win_amd64.whl; python_implementation=="cpython" && python_version~="3.6" && sys_platform == "windows" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp37-cp37m-win_amd64.whl; python_implementation=="cpython" && python_version~="3.7" && sys_platform == "windows" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp38-cp38m-win_amd64.whl; python_implementation=="cpython" && python_version~="3.8" && sys_platform == "windows" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0-cp36-none-macosx_10_12_x86_64.whl; python_implementation=="cpython" && python_version~="3.6" && sys_platform == "darwin" && platform_relase ~= "16.7.0" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0-cp37-none-macosx_10_12_x86_64.whl; python_implementation=="cpython" && python_version~="3.7" && sys_platform == "darwin" && platform_relase ~= "16.7.0" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0-cp38-none-macosx_10_12_x86_64.whl; python_implementation=="cpython" && python_version~="3.8" && sys_platform == "darwin" && platform_relase ~= "16.7.0" && platform_machine == "x86_64" # btw platform_release gives the kernel version on darwin, so it doesn't directly map to the OS release numberings used torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp36-cp36m-linux_x86_64.whl; python_implementation=="cpython" && python_version~="3.6" && sys_platform == "linux" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp37-cp37m-linux_x86_64.whl; python_implementation=="cpython" && python_version~="3.7" && sys_platform == "linux" && platform_machine == "x86_64" torch@https://download.pytorch.org/whl/cpu/torch-1.5.0%2Bcpu-cp38-cp38m-linux_x86_64.whl; python_implementation=="cpython" && python_version~="3.8" && sys_platform == "linux" && platform_machine == "x86_64" ```
pfmoore commented 4 years ago

Is the answer going to be that packages on pypi have to only depend on other packages on pypi?

No, but the end user needs to explicitly opt into any other index being used. This was a deliberate policy decision, to prevent malicious code on PyPI triggering download of code from other, arbitrary, locations.

So you depend on whatever version of pytorch you want, and instruct your user to add --extra-index-url https://download.pytorch.org/whl/cpu/torch_stable.html. This allows the user to review that index and confirm that they are happy to use it.

kousu commented 4 years ago

No, but the end user needs to explicitly opt into any other index being used. This was a deliberate policy decision, to prevent malicious code on PyPI triggering download of code from other, arbitrary, locations.

Thank you for explaining the history there. I can understand the reasoning. But I don't think it was that effective, because nothing stops malicious or just vulnerable code from ending up on PyPI:

Using the PEP 508 URL format I can make packages, even ones on PyPI, depend on arbitrary outside locations. For example:

setup(
  # ....
  install_requires=[
    # ...
    "requests@git+https://github.com/kousu/requests@google-surveillance",
    # ....

For pure-python source packages this works every time. It would be helpful if that worked for wheels too.

And there's another way to circumvent user opt-in: you can hide the --extra-index-url or --find-links in a requirements.txt:

https://github.com/neuropoly/spinalcordtoolbox/blob/b64cad3c846fd6bd7a557688b67b80fe0b2c6dc2/requirements.txt#L26

-f https://download.pytorch.org/whl/cpu/torch_stable.html
torch==1.5.0+cpu; sys_platform != "darwin"
torch==1.5.0; sys_platform == "darwin"
torchvision==0.6.0+cpu; sys_platform != "darwin"
torchvision==0.6.0; sys_platform == "darwin"

pip install -r requirements.txt doesn't prompt the user to ask if they are okay with using an unvetted source.

This is all pretty inconsistent and confusing :/.

In practice it just means, in order for devs to minimize the headache we give our users, that we'll write scripts like this. Our users don't know what pip is or who runs pypi. I don't even know who runs pypi, but I assume they've got it in hand. Our users definitely haven't thought through the implications of contacting this domain vs that domain.


I'm sorry for complaining. I know pypa is a big project and this is one more straw on the back. I think you're doing good work shaping all this clay! And that it's a lot to consider!

Here I want to make sure this one use case isn't forgotten: you support getting source packages from arbitrary URLs, so please also support binary packages the same way.

uranusjr commented 4 years ago

You are free to disagree with the policy, but the issue is outside the pip developers’ hands and need to be handled in pypa/warehouse instead.

kousu commented 4 years ago

It's about how pip chooses servers to download from.

Here's something that would solve my use case: reintroduce find-links as a PEP 508 URL scheme, maybe wheelhouse+https://:

pip install torch==1.5.0+cpu@wheelhouse+https://download.pytorch.org/whl/cpu/torch_stable.html
stinovlas commented 4 years ago

Using the PEP 508 URL format I can make packages, even ones on PyPI, depend on arbitrary outside locations.

Really? That doesn't seem right considering PEP 508 and PEP 440. Do you have an example of a package that exploits this?

pdxjohnny commented 4 years ago

Is the answer going to be that packages on pypi have to only depend on other packages on pypi?

No, but the end user needs to explicitly opt into any other index being used. This was a deliberate policy decision, to prevent malicious code on PyPI triggering download of code from other, arbitrary, locations.

If malicious code is already on PyPi. Why would it need to pull in code from other locations? Why not just push more malicious code to the same package? Or to another package and push that to PyPi?

I think the desire to make pip more secure is great. But I don't think the mitigation that was taken is effective, (see above os.system workaround). It seems to have broken many users workflows without providing an increase in security. I think it might be time we look at reversing the deprication of dependency links. What would the process be for that?

kousu commented 4 years ago

Using the PEP 508 URL format I can make packages, even ones on PyPI, depend on arbitrary outside locations.

Really? That doesn't seem right considering PEP 508 and PEP 440. Do you have an example of a package that exploits this?

I recant. I made a project with

# setup.cfg
# ...
[options]
packages = find:
include_package_data = true
python_requires = >=3.6
install_requires =
    requests@git+https://github.com/kousu/requests.git@google-surveillance
# ...

pypi rejects it when uploaded in wheel form, but not sdist form; but pip rejects the sdist form when it downloads it.

Details at https://github.com/kousu/donthackpypi

Okay. So I can accept that if you want to use non-pypi code you have to tell your users:

pip install --extra-index-url <something> --find-links <something_else> --extra-index-url <evenmore> .

ungainly but it's not worse than the others.

kousu commented 4 years ago

Is the answer going to be that packages on pypi have to only depend on other packages on pypi?

No, but the end user needs to explicitly opt into any other index being used. This was a deliberate policy decision, to prevent malicious code on PyPI triggering download of code from other, arbitrary, locations.

If malicious code is already on PyPi. Why would it need to pull in code from other locations? Why not just push more malicious code to the same package? Or to another package and push that to PyPi?

Yeah! Totally.

By the way, the workarounds you posted run in setup.py, which doesn't exist in wheels. You could distribute solely as an sdist, making everyone run your pip call, but there's an even slyer workaround that works with both! Just put this in your __init__.py or something:

while True:
    try:
       import torch
       break
    except ImportError:
        subprocess.run(['pip','install','-f','https://download.pytorch.org/whl/cpu/torch_stable.html', 'torch==1.5.0+cpu'], check=True)
dstufft commented 4 years ago

No, but the end user needs to explicitly opt into any other index being used. This was a deliberate policy decision, to prevent malicious code on PyPI triggering download of code from other, arbitrary, locations.

I'd just say that it wasn't entirely about malicious code (although that was part of it), and was largely around user expectations. Users expect that, absent additional configuration, invoking pip install ... will fetch files only from PyPI, and with additional configuration, only from their configured repositories.

Dependency links broke this expectation, and brought along with it a rash of issues where "pip install " would depend on an unknownable set of servers outside of just their configured locations. This caused a number of problems, most obviously in locked down environments where accessing a service required getting a hole punched in a firewall, but also would render people's attempts to mirror PyPI moot because these external links wouldn't get mirrored.

Basically, a user should be in charge of where they download files from, it should not be something under someone else's control. Anything that takes that control away from end users is not going to come back and if the URL form of PEP 508 works on PyPI, that's a bug that needs fixed. It should not work for any type of distribution uploaded to PyPI.