python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.67k stars 2.27k forks source link

Cannot download packages from private PyPi repository using HTTP basic auth with Poetry 1.1.0 + old v1.0.x PypiCloud w/ default settings #3041

Closed MasterNayru closed 4 years ago

MasterNayru commented 4 years ago

Issue

We have been using Poetry to pull down packages from a private PyPi repository and everything has been working fine until Poetry 1.1.0. We are configuring poetry to talk to our private PyPi installation by HTTP basic auth, and that auth works perfectly fine to resolve which versions of a package to install. The problem seems to be that that same auth is then used in the requests to download wheels from PyPi, which causes the following error to occur:

$ poetry config http-basic.myprivaterepo <username> <password>
$ poetry update -vvv

<snip>

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       452│ 
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest, session=self.session)
       455│ 
       456│     def _log(self, msg, level="info"):

   1  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/utils/helpers.py:98 in download_file
        96│ 
        97│     with get(url, stream=True) as response:
     →  98│         response.raise_for_status()
        99│ 
       100│         with open(dest, "wb") as f:

  HTTPError

  400 Client Error: Bad Request for url: https://deckard-pip.s3.amazonaws.com/1234/my_broken_dependency/my_broken_dependency-0.1.3-py3-none-any.whl?AWSAccessKeyId=<key>&Signature=kz30gf304b%2F%2F93pQeUSPrto5MiE%3D&x-amz-security-token=<token>&Expires=1601690152

  at ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/requests/models.py:941 in raise_for_status
      937│         elif 500 <= self.status_code < 600:
      938│             http_error_msg = u'%s Server Error: %s for url: %s' % (self.status_code, reason, self.url)
      939│ 
      940│         if http_error_msg:
    → 941│             raise HTTPError(http_error_msg, response=self)
      942│ 
      943│     def close(self):
      944│         
      945│         called the underlying ``raw`` object must not be accessed again.

If I change the following lines in the poetry code:

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest, session=self.session)

changes to:

   2  ~/.pyenv/versions/3.8.5/lib/python3.8/site-packages/poetry/repositories/pypi_repository.py:454 in _download
       453│     def _download(self, url, dest):  # type: (str, str) -> None
     → 454│         return download_file(url, dest)

and re-run, everything works:

$ poetry update
Skipping virtualenv creation, as specified in config file.
Updating dependencies
Resolving dependencies... (41.8s)

No dependencies to install or update

It seems like the auth is needed to talk to the API for package version resolution but causes issues when it is also used for package downloads. If it makes any difference, I am using pypicloud as the backend for my private PyPi installation. I am trying to be as brief as possible with my output as possible without dumping any keys or stuff like that. Please let me know if you need any more information or suggestions on what I should be changing in my configuration to get my stuff working again.

abn commented 4 years ago

@MasterNayru interesting. We recently identified that in 1.0.10 we did not apply authentication correctly for source specified in the pyproject toml. In typical circumstances we expect authentication to be used for both api queries as well as file downloads.

Am I correct in understanding that in your case; authentication is used to retrieve wheel direct links (ie. incl. tokens) but the expectation is that we do not send basic auth when downloading these wheels? If so, this is a bit tricky, as this would mean there are 2 use cases that are not necessarily compatible with each other.

MasterNayru commented 4 years ago

@abn I am expecting that if I am trying to download packages from S3 that, since pypicloud returns a URL with the necessary auth parameters in the download URL from the API requests, and since S3 seems to error out when the username/password auth parameters are provided, that they will somehow not be used as part of the requests for the downloads, which seemed to fit in line pretty well with the behaviour in the older versions.

MasterNayru commented 4 years ago

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

pepastach commented 4 years ago

We hit exactly the same problem with poetry 1.1.1 and pypicloud 1.0.10. The way I understand it, this can't be solved by pypicloud upgrade. Here's my reasoning:

  1. poetry makes a call to pypicloud
  2. pypicloud returns a pre-signed URL pointing to our S3 bucket (the URL already contains AWS access key and token)
  3. poetry makes a GET request (using requests library) but since it passes the session as @MasterNayru described in the issue, requests adds the Authorization header. This makes the request invalid. We verified it manually using curl.

Since it's poetry/requests who adds the authorization header, I don't see how this can be fixed on pypicloud side.

Please correct me (or reopen the issue 😉 ).

pepastach commented 4 years ago

pypicloud 1.0.11 introduced the ability to stream files through pypicloud. By briefly looking at the diff, it seems like we can configure pypicloud with pypi.stream_files = True. pypicloud should then return the package file directly instead of redirecting to S3.

We'll try pypicloud bump and report back.

Katafalkas commented 4 years ago

Hi. Ran into the same issue. pypicloud==1.1.5 poetry==1.1.3

Tried both with and without pypi.stream_files = True - same issue.

Error is because of the headers being sent. The same url is downloadable using curl.

b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>Basic XxXx...</ArgumentValue><RequestId>58D6A864A3D27683</RequestId><HostId>FpXxx...</HostId></Error>'
Katafalkas commented 4 years ago

@MasterNayru I would suggest that this issue should be considered as poetry issue. The reason being - same pypicloud server is working just fine with pip, and does not with poetry.

We run number of repositories and python packages, only one of them is with poetry and currently it does not work. We should either wait or help with fixing poetry or migrate to pip.

abn commented 4 years ago

@Katafalkas one aspect to note here is that pip and poetry have different uses for these URLs. One such case is that poetry stores these URLs in the lock file. Considering that the tokens used for these URLs are short-lived, it is not ideal to be used within a lock file. As far as I can tell the use of short-lived authorised URLs are not a defined behaviour. The use case in pip most likely works because search and retrival are treated seperately, this in-effect might allow this to work as expected.

Considering that pypicloud is a common component these days, we might need to look at how we can better support it. On the other hand however, PEP 503 does not define a mechanism for independent authentication for the file URLs. Typically, the authentication for the host domain is re-used.

Out of curiosity, are the domains different for the index and the file?

Qu4tro commented 4 years ago

They are for us (also using pypicloud), as the files are hosted on amazonaws.com and the index is on our domain.

Katafalkas commented 3 years ago

@Katafalkas one aspect to note here is that pip and poetry have different uses for these URLs. One such case is that poetry stores these URLs in the lock file. Considering that the tokens used for these URLs are short-lived, it is not ideal to be used within a lock file. As far as I can tell the use of short-lived authorised URLs are not a defined behaviour. The use case in pip most likely works because search and retrival are treated seperately, this in-effect might allow this to work as expected.

Considering that pypicloud is a common component these days, we might need to look at how we can better support it. On the other hand however, PEP 503 does not define a mechanism for independent authentication for the file URLs. Typically, the authentication for the host domain is re-used.

Out of curiosity, are the domains different for the index and the file?

The url of pypi-cloud server and the file served from S3 are different by default, but there is an option to passthrough url. Which makes both of those URLs the same.

cereblanco commented 3 years ago

This is how I setup private pypi

  1. Edit pyproject.toml

    [[tool.poetry.source]]
    name = "myprivate_pypi"
    url = "https://pypi.myprivate_pypi.com/simple/"
  2. At terminal, add poetry config credentials for private_pypi poetry config http-basic.myprivate_pypi <username> <password>

  3. Update lock with --no-update poetry lock --no-update

  4. Add your library that is found at private pypi or poetry install poetry add <my-packate-found-at-private>

@meanderingcode let me know if this one works for you

MeanderingCode commented 3 years ago

This is how I setup private pypi

  1. Edit pyproject.toml

    [[tool.poetry.source]]
    name = "myprivate_pypi"
    url = "https://pypi.myprivate_pypi.com/simple/"
  2. At terminal, add poetry config credentials for private_pypi poetry config http-basic.myprivate_pypi <username> <password>

  3. Update lock with --no-update poetry lock --no-update

  4. Add your library that is found at private pypi or poetry install poetry add <my-packate-found-at-private>

@meanderingcode let me know if this one works for you

@cereblanco Thank you. I discovered this yesterday when looking at changes related to legacy repositories. Edited my comment on your PR.

rizerzero commented 3 years ago

@cereblanco

Hi, I tested your method and it did not work for me. When I use a version above 1.0.10 I get this message, it seems like poetry is trying to download the 'requests' package which is a dev dependency from my private repository 🤨. (is this an expected behaviour ?)

My server should not return a 500 error, but the previous versions were not trying to download other packages from my private repo.

500 Server Error: Internal Server Error for url: https://pypi.myprivaterepo.com/simple/requests/
 at /usr/local/lib/python3.6/site-packages/poetry/repositories/legacy_repository.py:393 in _get
      389│             if response.status_code == 404:
      390│                 return
      391│             response.raise_for_status()
      392│         except requests.HTTPError as e:
    → 393│             raise RepositoryError(e)
      394│
      395│         if response.status_code in (401, 403):
      396│             self._log(
      397│                 "Authorization error accessing {url}".format(url=url), level="warn"
The command '/bin/sh -c poetry lock --no-update' returned a non-zero code: 1 

Here is my pyproject.toml .

[tool.poetry]
name = "project"
version = "0.4.0"
description = "project"
authors = ["me <me@author.com>"]

[tool.poetry.dependencies]
python = "^3.6"
python-dotenv = "^0.10.3"
myprivatepackage = "^1.1.13"
fastapi = "^0.61.1"
pymongo = "^3.11.0"
uvicorn = "^0.12.2"
graphene-pydantic = "^0.2.0"

[tool.poetry.dev-dependencies]
pytest = "^3.0"
requests = "^2.25.1"

[[tool.poetry.source]]
name = "myprivaterepo"
url = "https://pypi.myprivaterepo.com/simple/"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
jensgustafsson commented 3 years ago

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

I think we should consider reopening this issue as this problem still exists when using a pypi server that returns presigned urls to be used when fetching packages (for instance pypi-cloud)

Right now I'm forced to stick with poetry<1.1 due to this problem.

rizerzero commented 3 years ago

@abn I am going to close this issue as it would appear that pypicloud have, since I last looked at it, defaulted to the behaviour which poetry is enforcing with regards to auth for package downloads. Rather than try to support the old behaviour which pip works with, I will update my pypicloud installation and make use of the new behaviour. Cheers for bearing with us.

Same for me, forced to stick with poetry<1.1 due to this problem.

FlippAre commented 3 years ago

Also experiencing the same problem. We have tried to configure pypicloud according to suggestions in this issue, but no luck. It's forcing us to stick to <1.1, which is unfortunate to leave all the great speed improvements of >1.1 on the table

jensgustafsson commented 3 years ago

I wouldn't mind helping out fixing this issue but then I need to know that the poetry community actually considers this as a bug.

I would also like some context. What was changed and why in 1.1 with regards to the download package functionality?

MasterNayru commented 3 years ago

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess.

The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail

In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would.

I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

jensgustafsson commented 3 years ago

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess.

The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail

In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would.

I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

This was very interesting news! We're running pypi cloud 1.1.17 (The latest version) and running poetry lock does not work in any release after 1.0.10 of poetry. Would you mind sharing you pypi cloud config file?

Which version of pypi cloud are you using btw?

Our config looks like this:

[app:main]
use = egg:pypicloud

redirect_urls = true

pypi.fallback = cache
pypi.always_show_upstream = True
pypi.stream_files = True
pypi.package_max_age = 604800
pypi.storage = s3
storage.bucket = S3_BUCKET
storage.region_name = eu-west-1
pypi.db = redis
db.url = REDIS_URL
jensgustafsson commented 3 years ago

The fix is to make pypicloud stop returning pre-signed URLs to pip. Python package management tools are enough of a mess without having to assume that the auth for package information retrieval is different from the auth needed for package downloads. This is the way pypi.org works and having to assume that auth maybe is or is not the same is the kind of thing that makes these package manager tools an absolute mess. The whole reason why it returned pre-signed URLs by default was because easy_install wouldn't work without it. I wish I was kidding. https://pypicloud.readthedocs.io/en/latest/topics/redirect_urls.html#redirect-detail In pypicloud v1.0.14 (in a patch change, I guess they aren't doing the whole semver thing), https://pypicloud.readthedocs.io/en/latest/changes.html#id9 they changed the config value storage.redirect_urls to True. That changes the URLs returned to pip to actually be download URLs using the same server name for downloads as for API requests for package info and that require the same auth, which then makes poetry's auth behaviour work fine and makes pypicloud behave like pypi.org would. I haven't needed to change this setting myself as I fixed the issue by just updating pypicloud to a version where it was set to True by default, but that is the setting you would need to change. Haven't had an issue with it since.

This was very interesting news! We're running pypi cloud 1.1.17 (The latest version) and running poetry lock does not work in any release after 1.0.10 of poetry. Would you mind sharing you pypi cloud config file?

Which version of pypi cloud are you using btw?

Our config looks like this:

[app:main]
use = egg:pypicloud

redirect_urls = true

pypi.fallback = cache
pypi.always_show_upstream = True
pypi.stream_files = True
pypi.package_max_age = 604800
pypi.storage = s3
storage.bucket = S3_BUCKET
storage.region_name = eu-west-1
pypi.db = redis
db.url = REDIS_URL

UPDATE: Things actually works! We were indeed running an old version of pypi cloud. After updating to 1.1.17 things actually started to work! 🌟

MasterNayru commented 3 years ago

Great to hear that it is working for you. The setting you would have needed to set was storage.redirect_urls, not redirect_urls as you had it in the config you posted

jensgustafsson commented 3 years ago

Great to hear that it is working for you. The setting you would have needed to set was storage.redirect_urls, not redirect_urls as you had it in the config you posted

Thanks! Yes I actually figured that out after your previous reply 🙏 Thanks a lot!

voney commented 2 years ago

I'm getting the exact same issue but using myget.org instead. The "solution" here is specific to pypicloud but the underlying issue of a package download being redirected to a different URL with auth baked in remains. I imagnee this issue will only grow as services use managed S3 style storage more and more.

Can this be re-opened and fixed "properly"?

abn commented 2 years ago

@voney can you try the fix at #5518? That should "in theory" handle this better. If not, please create a new issue.

github-actions[bot] commented 8 months ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.