[Feature Request] Dependency-specific index urls

johnpyp commented 4 weeks ago

Some packages like pytorch recommend installing their packages through custom index urls, e.g from that page:

# To Install:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0

Though we could use url prioritization (with uv to make it consistent) for this, it would be better in this case to support dependency-scoped index urls, to avoid leaking the extra index url check to every other dependency as well, which introduces an unexpected supply chain scope to all dependencies as well.

Skypekey commented 3 weeks ago

Did you mean this? https://hatch.pypa.io/dev/config/dependency/#direct-references I think this is useful

johnpyp commented 3 weeks ago

I don't think so, as that's specifying an exact artifact to fetch from rather than the registry to resolve the given dependency from.

Skypekey commented 3 weeks ago

Oh, I understood. You need to specify a pypi source for certain modules. forgive my misunderstanding

polarathene commented 2 weeks ago

Just sharing my findings here if helpful.

PDM

PDM has a kinda nice way to approach this (falters if you want a project to support multiple PyTorch sources though):

[project]
name = "example"

dependencies = [
    "torch", # Implicitly resolves to `2.3.1+cu121` via configured PyTorch source below
    "torchvision",
    "torchaudio",
]
requires-python = ">=3.10"

[tool.pdm.resolution]
respect-source-order = true

[tool.pdm]
distribution = false

[[tool.pdm.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
include_packages = ["torch", "torchvision", "torchaudio", "nvidia-*"]

The nvidia-* at the end there is to ensure that the torch deps resolve to the implicit nvidia-* packages from the torch index. Otherwise they'd come from PyPi, even though presently some of those were resolving to CUDA 12.5 instead of the intended and compatible CUDA 12.1 that these packages were intended to use.

That may be relevant context for you to keep in mind with your request to scope deps, as you may otherwise encounter that same caveat.

They also have optional dependency groups:

dependencies = [
    "torchvision",
    "torchaudio",
]

[project.optional-dependencies]
torch_cpu = ["torch==2.3.1+cpu"]
torch_cuda = ["torch==2.3.1+cu121"]

[[tool.pdm.source]]
name = "pytorch-cuda-12.1"
url = "https://download.pytorch.org/whl/cu121"
include_packages = ["torch_cuda", "torchvision", "torchaudio", "nvidia-*"]

[[tool.pdm.source]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
include_packages = ["torch_cpu", "torchvision", "torchaudio"]

You'd then run a command to specify the optional dep as a group like pdm install --group torch_cpu, however this won't work as expected due to the overlapping CUDA package source. You'd need to migrate the packages from dependencies to the optional-dependencies table with each group providing the explicit local identifier, which enforces the version pin like with torch (you cannot use >=).

If you don't specify the torchvision + torchaudio packages in each of the sources include_packages, then they'd resolve to the fallback PyPi default index package for resolution.

EDIT: You could alternatively specify them in each group (_target_cpu / target_cuda_). This is advised for this type of package variance. However this doesn't avoid the multiple sources with overlapping include_packages, unless you explicitly use local identifiers for these deps they will still match/resolve to the packages at the undesired indexes.
Likewise, there is a known upstream PyTorch issue with +cpu not being compatible/assigned for the ARM64 / aarch64 platform, the pytorch-cpu source index is valid, but the local identifier +cpu must be omitted in that case.

Each would need to maintain separate lock files with PDM too. You could workaround the pyproject.toml issues mentioned by using separate pyproject.toml files, for PDM at least it doesn't seem like there is interest to improve on the flexibility. There is a third-party plugin that provides an alternative way to configure torch deps (generates separate lock files).

The include_packages setting will bias the package to that source AFAIK, but other packages will attempt to resolve through indexes including these (unless explicitly excluding them). Priority seems to be predictable with respect-source-order = true by the order the source is declared in with PyPi as the default unless you have include_packages declared.

With `Rye` / `Hatch`

Rye has a similar feature for sources management, but without the include_packages filter.
- It also supports optional deps, but these are intentionally not installable for Application projects (I might be misunderstanding that page, further context). So it's lacking that flexibility that PDM offers?
- It does however support [storing index information in it's lock file via --with-sources or it's equivalent config setting. Which should allow locking deps to those sources, but lacks persisting that association into the pyproject.toml. FWIW, PDM has a similar feature via lock strategies but that is not exactly index aware, instead storing a direct URL to the dep at the index (so not update friendly).
Hatch does have the ability to use environment markers (for the ARM64 concern) and refers to it's optional deps grouping as 'features'
- Your env config can reference the features to install
- The matrix features may further help support the different environments (cpu (AMD64 vs ARM64 local identifier discrepancy, rocm and cuda environments with version specific local identifiers) for packages and their respective PyTorch index URLs.

These are my observations so far at least 😅

PDM is looking into adopting uv (which should notably help an issue I've observed with it's cache performance)
Hatch delegates to pip / uv, thus no lock file support until that tooling adopts such (there's something similar AFAIK, but lacks those capabilities that PDM and Rye offer).

pypa / hatch