python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.18k stars 2.25k forks source link

Adding a source even with secondary true take precedence #4704

Closed ierezell closed 2 years ago

ierezell commented 2 years ago

Issue

Adding a source like

[[tool.poetry.source]]                                                          
    name = "pytorch"                                                                
    url = "https://download.pytorch.org/whl/cu111"                                    
    secondary = true   

Works perfectly well when installing torch with

torch = { version = "=1.10.0+cpu", source = "pytorch"}    

but when trying to add mypy with poetry add poetry add mypy then I got a :

403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cpu/mypy

Which obviously doesn't work as mypy is not in the pytorch package repository.... I don't get why pypi is not the default and "Pytorch" just as a fallback.

Thanks in advance for any help, Have a great day.

mxab commented 2 years ago

I think I'm experiencing the same issue

gregorybchris commented 2 years ago

I can confirm that all versions 1.1.0 through 1.1.11 have this same issue. Downgrading to 1.0.10 is a temporary workaround.

mosauter commented 2 years ago

Can Confirm this issue is still present in 1.1.12

lightspot21 commented 2 years ago

Also confirming here on 1.1.12. Secondary sources are overriding pypi in all cases.

Seems to be related to #3855

ierezell commented 2 years ago

Hello there :)

Sorry to bump but is there any news about that ?

I'm still getting

Creating virtualenv bpnlp in /mnt/Documents/nlp/packages/libs/.venv
Using virtualenv: /mnt/Documents/nlp/packages/libs/.venv
Updating dependencies
Resolving dependencies... (4.3s)<debug>pytorch:</debug> Authorization error accessing https://download.pytorch.org/whl/cu113/torch_stable.html/isort/
Resolving dependencies... (4.5s)<debug>pytorch:</debug> Authorization error accessing https://download.pytorch.org/whl/cu113/torch_stable.html/sphinx-rtd-theme/
Resolving dependencies... (4.6s)<debug>pytorch:</debug> Authorization error accessing https://download.pytorch.org/whl/cu113/torch_stable.html/sphinx-autodoc-typehints/

... And so on for all packages other than torch.

It's really breaking all poetry for my workflow (I need pytorch cu11 which needs to be downloaded from their source). I'm willing to contribute if needed.

P.S : using 1.0.10 does not work for me as of https://github.com/python-poetry/poetry/issues/2711 using 1.2.0a2 is not fixing the error.

Thanks in advance, Have a great day !

ierezell commented 2 years ago

For those who are still struggling.....

Without sources, without any of the fancy stuff, just plain dependency declaration. You can declare manually the wheels :

torch = [
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp37-cp37m-linux_x86_64.whl", python=">=3.7,<3.8", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp37-cp37m-win_amd64.whl", python=">=3.7,<3.8", markers="sys_platform == 'win32'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp38-cp38-linux_x86_64.whl", python=">=3.8,<3.9", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp38-cp38-win_amd64.whl", python=">=3.8,<3.9", markers="sys_platform == 'win32'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp39-cp39-linux_x86_64.whl", python=">=3.9,<3.10", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.10.1%2Bcu113-cp39-cp39-win_amd64.whl", python=">=3.9,<3.10", markers="sys_platform == 'win32'"},
]

(I know it's a quick fix, dependency resolution was really long, it's not a long term solution, but at least it will help you keep going :) )

janpf commented 2 years ago

Thanks @Ierezell, that's also how I'm doing it now, "really long" is still an understatement tho :/ Every single poetry operation also takes forever since I added it like this. Is this preventable?

ierezell commented 2 years ago

Hi @janpf, yes it's really long but I found out why....

poetry is downloading all the torch wheels ! (I guess to check dependencies and stuff) which means it will download 6 * 2Gb... yes downloading even windows binaries even if on linux...)

Specifying one wheel only can reduce the time.

Also, I realized that even with the many errors as

403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cpu/mypy

It's trying with pytorch URL and print errors but finally use the correct one and install everything.

So specifying the source is "working".

Hope that helps, Have a great day

pbsds commented 2 years ago

Related: #5122 #3855

For some reason, others seem to be having the opposite problem: #4854, #4659

oriolcmp commented 2 years ago

Hey, I have the same issue and I'm in version 1.1.13. Do you know if there are any plans to fix that? This secondary option would be very useful if it was working. Thanks

suneeta-mall commented 2 years ago

This issue is blocking "unlocking the poetry" potentials when it comes to leveraging poetry in the stack that uses PyTorch and its ecosystem.

I have tried with following: toml setting

[[tool.poetry.source]]
name = "torch"
url = "https://download.pytorch.org/whl/cu113"
secondary = true
default = false

but the default false is not recognized for some reason and ends up getting the 403: Error:

403 Client Error: Forbidden for url: https://download.pytorch.org/whl/cpu/mypy

As mentioned in this ticket, https://github.com/python-poetry/poetry/issues/4704 this is a known issue. However, amongst all possible ways to address this issue, this solution of using secondary sources seems to be the ideal fix for the issue in question.

As a short-term interim, I have also tried platform and version-specific settings. This would work fine if PyTorch was my leaf dependency. Because my setup involves using PyTorch, Torchvision, and Pytorch lightening. Because more dependencies rely on PyTorch, just specifying torch wheels in toml fails to solve the dependency: toml setting

torch = [
    { url="https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp37-cp37m-linux_x86_64.whl", python=">=3.7,<3.8", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp38-cp38-linux_x86_64.whl", python=">=3.8,<3.9", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp39-cp39-linux_x86_64.whl", python=">=3.9,<3.10", markers="sys_platform == 'linux'"},
    { version = "=1.11.0", markers = "sys_platform == 'darwin' or sys_platform == 'win32'" },
]
torchvision = [
    { url="https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp37-cp37m-linux_x86_64.whl", python=">=3.7,<3.8", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp38-cp38-linux_x86_64.whl", python=">=3.8,<3.9", markers="sys_platform == 'linux'"},
    { url="https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp39-cp39-linux_x86_64.whl", python=">=3.9,<3.10", markers="sys_platform == 'linux'"},
    { version = "=0.12.0", markers = "sys_platform == 'darwin' or sys_platform == 'win32'" },
]

Error:

  SolverProblemError

  Because torchvision (0.12.0+cu113) depends on torch (1.11.0)
   and XXXX-app depends on torch (1.11.0+cu113), torchvision is forbidden.
  So, because XXXX-app depends on torchvision (0.12.0+cu113), version solving failed.

I have been in knots with this one, particularly because there are so many issues open around this issue: https://github.com/python-poetry/poetry/issues/2543 https://github.com/python-poetry/poetry/issues/4231 https://github.com/python-poetry/poetry/issues/3855 https://github.com/python-poetry/poetry/issues/2613 https://github.com/python-poetry/poetry/issues/4704 https://github.com/python-poetry/poetry/issues/2339

The only solution that works cross-platform is https://github.com/nat-n/poethepoet but that is not a great solution either (not lining up with lock file, not using same cache etc, the need for additional pip run!). It would be great if we can fix this issue.

abn commented 2 years ago

The 403 is a warning, as Poetry by default searches all sources for a package unless the package explicitly specifies a source (poetry add --source pypi or poetry add --source torch). The use of secondary = true only implies preference when choosing the best match.

This is because a package might be available in both indices and the best version might be in either. This is common where there are post builds (production patches) or runtime environment specific wheel builds where one might not be available in PyPI etc.

Additionally, this is a quirk of the download.pytorch.org PEP 503 (aimple api) repository. A 403 is not expected here, but rather a 404 as the package mypy does not exist. It is a warning log when a 403 is encountered, because it is more likely that the source is misconfigured and we want to make it obvious for the user - this is definitely up for discussion as pushing this to debug log or adding a "prefetch" logic like suggested in #5442 to reduce these network calls.

There is also an option (new feature) to configure such repositories as fallback-only to behave as the OP expected. But this is not possible at present.

GuillaumeDesforges commented 2 years ago

This issue is an important blocker for my job as well. I really want to push forward poetry to teams because IMO it is the best tool available at the moment, but it's just impossible because of this issue.

Additionally, this is a quirk of the download.pytorch.org PEP 503 (aimple api) repository.

Expecting that all PEP503 repositories will be configured to send 404 and not 403 is a big assumption. It has been like that for long, and is unlikely to change. Handling that on poetry's side is possible; it would allow poetry to be used in ML research, which is desirable to improve research. -> It looks like a low hanging fruit.

@sdispater @abn moving forward, some propositions have been made, (to me) it seems to be on poetry's core contributor's side to pick one. I'll be happy to hack and improve any PR to make it move forward, but I'll need your support. :)

abn commented 2 years ago

@GuillaumeDesforges can you clarify how this issue is impacting your use case please? The 403 log is, just that a log indicating Poetry received an authentication failure for a configured repository.

From my understanding, the fact that Poetry looks for dependencies in both primary and secondary sources is a documented behaviour (this definitely also will need improvement given the confusion here).

The 403 error used to propogate as a hard error in older versions of Poetry, however this was changed to become a logged warning. And #5442 should reduce the noise in the log as well as reduce network requests. Also see https://github.com/python-poetry/poetry/issues/4231#issuecomment-1114048892.

Expecting that all PEP503 repositories will be configured to send 404 and not 403 is a big assumption. It has been like that for long, and is unlikely to change.

Personally, I do not think this is a big assumption at all. If a resource does not exist, it is reasonable to assume the response if all else is configured correct, to be 404. A 403, explicitly indicates "something is wrong with your auth". The latter is something we definitely want to notify the user of.

Feel free to ping on discord if you have specific PRs or proposals you want to work on or get consensus on, happy to help. There are a few issues surrounding the use of pytorch binaries within a Poetry project.

GuillaumeDesforges commented 2 years ago

The 403 error used to propogate as a hard error in older versions of Poetry, however this was changed to become a logged warning

Ah! I didn't know that, my bad. Could you tell me in which release it has been fixed? It would be a huge help.

Personally, I do not think this is a big assumption at all. If a resource does not exist, it is reasonable to assume the response if all else is configured correct, to be 404

In my experience, you first refuse wrongly authenticated request on non-public resources, including possibly non-existing ones, as you might not want to leak information (e.g. a package name could be a private information you don't want to expose, so scanning and filtering 403 vs 404 would allow to make a leak).

But I guess both can be considered. Either way, what matters is that poetry does not crash in such cases.

abn commented 2 years ago

Could you tell me in which release it has been fixed?

Should be available since 1.2.0a1. https://github.com/python-poetry/poetry/commit/3c9ced2e12618f9a9946a76c0430b8c80c0d0374

In my experience, you first refuse wrongly authenticated request on non-public resources, including possibly non-existing ones, as you might not want to leak information (e.g. a package name could be a private information you don't want to expose, so scanning and filtering 403 vs 404 would allow to make a leak).

I suspect our experiences and personal tastes differ here. Typically you'd opt for a 404. An example of this is github itself. From an unauthorised user's perspective the resource does not exist. Unless there is a need to act explicitly on a 403, returning it is not something I personally would choose unless I have to. And S3 buckets doing 403s is largely accepted as a confusing "feature".

But I guess both can be considered. Either way, what matters is that poetry does not crash in such cases.

Indeed.

diegoquintanav commented 2 years ago

The 403 is a warning, as Poetry by default searches all sources for a package unless the package explicitly specifies a source (poetry add --source pypi or poetry add --source torch). The use of secondary = true only implies preference when choosing the best match.

This is because a package might be available in both indices and the best version might be in either. This is common where there are post builds (production patches) or runtime environment specific wheel builds where one might not be available in PyPI etc.

Additionally, this is a quirk of the download.pytorch.org PEP 503 (aimple api) repository. A 403 is not expected here, but rather a 404 as the package mypy does not exist. It is a warning log when a 403 is encountered, because it is more likely that the source is misconfigured and we want to make it obvious for the user - this is definitely up for discussion as pushing this to debug log or adding a "prefetch" logic like suggested in #5442 to reduce these network calls.

There is also an option (new feature) to configure such repositories as fallback-only to behave as the OP expected. But this is not possible at present.

Hi! thanks for explaining why this is happening. In my case (in issue #5538), I was expecting that by setting my private repo as secondary, then Poetry would look into pipy first and then into my secondary repos.

What I understand from your reply is that :

I would expect that poetry catches the failure and defaults to the first (and only) version available. As it is, secondary would only work with mirrors of pypi that happen to host an extra package.

abn commented 2 years ago

poetry fails because the secondary repo failed, although the first one works

This is incorrect. If no packages were found in secondaries Poetry will only consider the ones it already found.

diegoquintanav commented 2 years ago

poetry fails because the secondary repo failed, although the first one works

This is incorrect. If no packages were found in secondaries Poetry will only consider the ones it already found.

Hi! thanks for replying. In my case, it happens when I run an update. Would you mind having a look at issue #5538?

abn commented 2 years ago

@diegoquintanav as i mentioned in https://github.com/python-poetry/poetry/issues/4704#issuecomment-1119848956 the change to this being a warning and not an error; is already available since Poetry 1.2.0a1. Please use that or later version. 1.1 branch will likely not get a release with this fix as 1.2 is around the corner.

abn commented 2 years ago

This is expected behavior. See https://python-poetry.org/docs/master/repositories/#secondary-package-sources.