Closed ferminho closed 1 year ago
Updated for clarifying both problem and proposal
Hi, have you tried specifying resolution order?
From your description of the problem, this should solve it.
Hi, yes @AdamJel , thanks for the proposal. I did try that, but it seems that resolution is performed after downloading information from all sources, so the resulting times are the same even if the packages are present in the first source.
I did the test with respect-source-order like this:
[[tool.pdm.source]]
url = "https://pypi.org/simple"
name = "pypi"
verify_ssl = true
[[tool.pdm.source]]
type = "index-url"
url = "https://download.pytorch.org/whl/cpu/torch_stable.html"
name = "pytorch"
[tool.pdm.resolution]
respect-source-order = true
@ferminho It is due to the nature of the resolution process and needs non-trivial effort to change it.
The finder collects packages, and the resolver decides whether a particular package matches. So a finder can't decide to stop collecting by itself. A practical example is, there are version 1 on the private index, and version 2 on the fallback index. If the current dependency set only accepts version 2, but the package finder doesn't find that, it will cause a resolution failure.
Thanks for the insights @frostming , doesn't seem easy to do what I proposed, indeed. What about being able to specify a limited set of packages for a source, so the finder filters the packages to search for?
If you think that would be a viable option, I am willing to help, I can try to do it.
I'm facing the same problem as torch
has decided to ship their CPU version on a custom index. Resolving all packages against their index is super-slow. Did you find a workaround? Right now all our CI systems and whatnot is resolving against the super-slow torch index 😭
We didn't find a proper workaround, so we are (mostly) still stuck with the pip-managed Databricks environment. However, we know this won't work for us when we want to run production code in the platform.
Would only like to point out that "use extra sources only when dependency not present in PyPI" is not optimal in terms of security. A malicious actor could upload a package with the same name as your private one on the public pypi but with a greater version and the dependency manager would install it: see here.
Good point, I agree. Binding specific packages and sources seems indeed like the best solution without this kind of security concerns, as suggested in #1645 so I'm going to close the issue to focus on the other thread :+1:
Is your feature request related to a problem? Please describe.
pdm lock can be very slow when dealing with a big pyproject and one of the libraries requires an extra index-url or find-links source, since it checks all sources for every library.
Concrete example that goes from ~30min without extra sources to 3-4h with the extra source: a pyproject reflecting the libraries in Databricks, one of those being PyTorch which requires an external source.
The Pytorch index must be expecting only Pytorch-related queries, since it usually reports 503s and 504s which for sure contribute to the slowness (it might be saturating it or exceeding query quotas).
Describe the solution you'd like
A way to tell pdm to rely on Pypi and use extra sources only when not found in Pypi (or a way to link specific dependencies to specific sources). (maybe there's a way to do it that I don't know of)