prefix-dev / pixi

Package management made easy
https://pixi.sh
BSD 3-Clause "New" or "Revised" License
3.41k stars 194 forks source link

Issues when installing pyannote.audio with torchaudio #1230

Open niemiaszek opened 7 months ago

niemiaszek commented 7 months ago

Checks

Reproducible example

Issue description

First of all, installing pypi packages with "." in name is not possible in pixi.toml ("data did not match any variant of untagged enum PyPiRequirement"), but can be done with pixi add and with - instead of ..

Second thing are the requirements for pyannote.audio. I'm building env with it for CPU only and it's fine when using:

[dependencies]
pytorch = {version="*", channel="pytorch"}
torchvision = {version="*", channel="pytorch"}
torchaudio = {version="*", channel="pytorch"}

However, adding pyannote-audio ends up installing torchaudio additionally from pip, as it is specified in its requirements:

torchaudio                            2.2.2         py311_cpu              5.1 MiB    conda  torchaudio-2.2.2-py311_cpu.tar.bz2
torchaudio                            2.2.2                                12.2 MiB   pypi   torchaudio-2.2.2-cp311-cp311-manylinux1_x86_64.whl

I think only one version of torch is installed, which is nice, but installing this torchaudio from pypi ruins installation, which was fine without pyannote.audio:

import torchaudio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/p.niemiec/Repos/diarization-poc/.pixi/envs/diarization-demo/lib/python3.11/site-packages/torchaudio/__init__.py", line 2, in <module>
    from . import _extension  # noqa  # usort: skip
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/p.niemiec/Repos/diarization-poc/.pixi/envs/diarization-demo/lib/python3.11/site-packages/torchaudio/_extension/__init__.py", line 38, in <module>
    _load_lib("libtorchaudio")
  File "/home/p.niemiec/Repos/diarization-poc/.pixi/envs/diarization-demo/lib/python3.11/site-packages/torchaudio/_extension/utils.py", line 60, in _load_lib
    torch.ops.load_library(path)
  File "/home/p.niemiec/Repos/diarization-poc/.pixi/envs/diarization-demo/lib/python3.11/site-packages/torch/_ops.py", line 933, in load_library
    ctypes.CDLL(path)
  File "/home/p.niemiec/Repos/diarization-poc/.pixi/envs/diarization-demo/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

Expected behavior

Only conda CPU version of torchaudio kept in env, working import for torchaudio/pyannote.audio

niemiaszek commented 7 months ago

Also general questions:

When I trigger pixi add from the pixi shell -e ..., should my shell get reloaded? I find myself removing .lock and .pixi quite often, but I'm still pixi noob and my envs are quite complex, so it might be a skill issue.

I also tried using channels = ["nvidia", {channel = "pytorch", priority = "-1"}] as in Multiple machines form one project example, but seems like "priority" isn't supported.

ruben-arts commented 7 months ago

This issue is proably due to our map only working for conda-forge. @nichmor What do we do with non conda-forge (e.g. pytorch) when we map conda to pypi?

@niemiaszek There was a typo in the documentation the priority is an int so you need to loose the ". Fix on its way: https://github.com/prefix-dev/pixi/pull/1234

pixi add should on bash and zsh trigger a reload of the environment. If you don't trust it. Just exit and pixi shell again.

What errors make you remove those files?

nichmor commented 7 months ago

This issue is proably due to our map only working for conda-forge. @nichmor What do we do with non conda-forge (e.g. pytorch) when we map conda to pypi?

hey ! on non-conda-forge channels we don't assume that conda-name is the same as pypi-name so this is reason why torchaudio is installed twice.

baszalmstra commented 7 months ago

Cant we add pytorch to the mapping though?

nichmor commented 7 months ago

Cant we add pytorch to the mapping though?

yes sure! we can extend it for all channels

niemiaszek commented 7 months ago

pixi add should on bash and zsh trigger a reload of the environment. If you don't trust it. Just exit and pixi shell again.

What errors make you remove those files?

Hard to tell for me right now, as I was doing some changes in a rush, but these were mostly dependency issues. One example I can recall is similar to #1194, where I added dask with pixi add, and then I was getting error importing dask 'pyarrow' has no attribute '__version__'. Most errors like these were coming from pyarrow being installed both from conda and pip, but it's perfectly fine for me now.

I will try to put more attention to this topic and reproduce some examples.

niemiaszek commented 7 months ago

Thanks for fast response. ML frameworks usually make life hard, but I'm quite impressed how easy to setup my envs are. This mapping thing is indeed important.

CUDA-related libs are also a bit of edge case. I'm quite amazed that there is still no common CUDA target. My dream would be a possibility to easily setup major frameworks [torch, tensorflow, jax, mlx] with GPU/CPU support. This would require also handling cases as installing pip wheels for cuda with TF and conda packages from nvidia channel for Pytorch, ultimately ending with one CUDA installation

I mentioned #261 with usage of Keras 3, which would be a fun example to play with, as Keras allows to use same codebase, switching only used backend (supports all major frameworks with MLX support on a way). I think this would be The Ultimate Benchmark, covering most user scenarios. I will fiddle with it a bit and try to set up one env with 3 frameworks, testing pip and conda combinations.

niemiaszek commented 6 months ago

@nichmor I've seen you are busy with other tasks, but I have a question related to this issue. I tried solving this env again on 0.23 and torchaudio got installed again with both conda and pypi version. Is there currently any workaround for this, like some manual mapping?

nichmor commented 6 months ago

@nichmor I've seen you are busy with other tasks, but I have a question related to this issue. I tried to solving this env again on 0.23 and torchaudio got installed again with both conda and pypi version. Is there currently any workaround for this, like some manual mapping?

Hey @niemiaszek ! Let me see what is the problem for it and comeback with a solution for this

nichmor commented 6 months ago

hey @niemiaszek ! You can define a custom mapping under project:

conda-pypi-map = { "pytorch" = "local_mapping.json" }

[tasks]

[dependencies]
captum = {version="*", channel="pytorch"}
boltons = {version = "*"}

[pypi-dependencies]
captum = { version = "*"}

and inside of it you can have : {"captum": "captum"} this means that it will map captum to captum. You can also use this: {"captum": null}, so in this case conda's captum will be not mapped.

Let me know if it helps you or you have any questions

niemiaszek commented 6 months ago

@nichmor I think I kinda get how this should work, but after making local mapping with {"torchaudio":"torchaudio"}, torchaudio was added correctly I think (only conda cpu version in pixi list), but other Pytorch packags now didn't get mapped. I assume I should concatenate my local map with the regular map that is already used for Pytorch. It mapped previously pypi "torch" from pyannote.audio requirements to condas "pytorch", that I specified in the pixi.toml, but installed torchaudio from both conda (desired) and pypi (undesired).

Can I find the regular mapping somewhere so I can just add one record for torchaudio, that won't overwrite current mappings?

niemiaszek commented 6 months ago

I also noticed some interesting warnings from CLI while doing so.

  1. After I added conda-pypi-mapping I got following warning: WARN pixi::project::manifest: Defined custom mapping channel https://conda.anaconda.org/pytorch/ is missing from project channels Please note that I don't use channel pytorch in default env, so that might be cause.

  2. When going with pixi shell into my env with custom mapping: WARN pixi::install_pypi: These conda-packages will be overridden by pypi: pytorch

This would align with the fact that both "pytorch" got installed from conda and "torch" with all requirements got installed from pypi in this env.

  1. After I removed custom mapping and setup my env again: WARN pixi::install_pypi: These conda-packages will be overridden by pypi: torchaudio Both torchaudios got installed.
niemiaszek commented 6 months ago

One strange thing is that after I passed {"torchaudio":"torchaudio", "pytorch": "torch"} as my local mapping, I got correct output of pixi list (torchaudio only from conda, same for pytorch), but I couldn't access packages from pytorch channel at all in python. Both import torch and import torchaudio didn`t find module.

I think I'm doing something wrong here

nichmor commented 6 months ago

hey @niemiaszek ! Sorry that I wasn't very explicit about mapping and how it works. If you define a custom mapping for a specific channel, pixi will not request our own mapping for it anymore. You can use this one as a starter and add there torchaudio. Please note that this mapping contains packages only from conda-forge channel, not pytorch.

After merging our mapping and adding there torchaudio I think your issue with importing should be fixed. ( after adding a new mapping, please remove lock file. we currently don't invalidate lock file if mapping changed ) Let me know if it helps.

niemiaszek commented 5 months ago

@nichmor thanks for help! Sorry for late response, but I was busy with other stuff this week :face_in_clouds:

Everything works perfect now. I expected that some extension for mapping was needed.

I was just confused how pytorch and torchvision get installed correctly, even tho they are also from pytorch channel. However, these packages are included in default mapping and torchaudio was just the one left behind.

Do you think there will be a way to make it work out of the box? I've seen you already started considering easier way to patch mapping