prefix-dev / parselmouth

BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Challenging one-to-one, name-to-name correspondence with some examples #9

Open jaimergp opened 5 months ago

jaimergp commented 5 months ago

There are a few PyPI packages (specially in the CUDA ecosystem) that map to more than one conda spec. And I say 'spec' and not 'name' because sometimes you need a particular version.

AFAICT, the current mappings only do name -> name. Any chance we could expand that? It might require manual annotation on top of the auto heuristics :/

nichmor commented 5 months ago

hey @jaimergp ! Good question!

Currently we record multiple names and it's versions that are present in source. For example, https://conda-mapping.prefix.dev/hash-v0/4608b9caafc4fa16d887f5af08e1bafe95f4cb07596ca8f5af184bf5de8f2c4c this is for conda's pyqt. From usage perspective - we currently use the simplified map of name-name but it's possible to have something like spec to spec or spec to array of spec ( if conda package has multiple pypi in it )

Regarding cupy-cuda12x, we map it to cupy right now. Could you please explain me what you would like to see in this case and how you would like to use it? Thanks

jaimergp commented 5 months ago

Currently we record multiple names and it's versions that are present in source. For example, https://conda-mapping.prefix.dev/hash-v0/4608b9caafc4fa16d887f5af08e1bafe95f4cb07596ca8f5af184bf5de8f2c4c this is for conda's pyqt.

This API is not generally available, right? I don't see this info in any of the files present in this repo.

Regarding cupy-cuda12x, we map it to cupy right now

I don't see cupy-cuda12x in this repo either

Could you please explain me what you would like to see in this case and how you would like to use it?

My point here is that cupy-cuda12x has an implicit constraint on which cuda version gets installed. This doesn't matter over at PyPI because everything's statically linked, but in the dynamic world of conda-forge, we need to specify the runtime libraries too. My current understanding of the CUDA stack in conda-forge is to use cuda-version=12 as the CUDA constraint (see this conda artifact).

The same situation might apply to other wheels with vendored libraries (e.g. BLAS, MPI, etc).

pytorch is a particularly tricky one because the CUDA variants have all the same name, they are just published to different indices (e.g. pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118. Not sure how or even if we could tackle this without some ad-hoc logic like inspecting the wheel URL.

nichmor commented 5 months ago

This API is not generally available, right? I don't see this info in any of the files present in this repo.

It is public available ( and we are using it in pixi ) but all files are stored under our S3. We also are thinking about making more user-friendly API to this ( maybe based on /package_name/version rather than by hash ).

I don't see cupy-cuda12x in this repo either

conda's cupy is present here : https://raw.githubusercontent.com/prefix-dev/parselmouth/main/files/compressed_mapping.json and it maps to cupy. Or you mean something else?

How you would like to see the mapping? More like spec mapping?

{
  name+version : { "pypi_names": [name+version, ...], "virtual_packages": ["cuda"] 
}
jaimergp commented 5 months ago

I'm realizing your mappings are doing conda -> pypi but not the reverse, so maybe what I'm asking doesn't make any sense. But I was thinking of something like:

{
  "cupy-cuda12x": ["cupy", "cuda-version=12"],
  "PyQt5": ["pyqt=5"],
}