Pytorch picks up wrong cuda version

EdvardsZ commented 1 month ago

Checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pixi, using pixi --version.

Reproducible example

This is Pixi.toml

[project]
channels = [
    "nvidia/label/cuda-11.8.0",
    "pytorch",
    "conda-forge"
] # conda channels
name = "test"
platforms = ["linux-64"]

[tasks]

[dependencies]
python = "3.10.*"
cuda = "*"
pytorch-cuda = "11.8.*"
pytorch = "2.*"
torchvision = ">=0.17.0,<0.18"
gcc = "11.*"
gxx = ">=11.4.0,<11.5"
setuptools = ">=75.1.0,<76"
numpy = "1.26.*"
[pypi-options]
no-build-isolation = ["simple-knn"]
[pypi-dependencies]
simple-knn = { path = "simple-knn" }

Then executing

git clone https://gitlab.inria.fr/bkerbl/simple-knn
pixi install

I get

The detected CUDA version (12.6) mismatches the version that was used to compile
PyTorch (11.8). Please make sure to use the same CUDA versions.

Issue description

Hello I am trying to build pytorch extension with different cuda version. My system has cuda 12.6 installed

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

When in pixi shell without the extension installed it gives me correctly

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

However, when installing pytorch extension "simple_knn" it gives me that error:

The detected CUDA version (12.6) mismatches the version that was used to compile
      PyTorch (11.8). Please make sure to use the same CUDA versions.

The workaround I found was to set CUDA_HOME variable:

export CUDA_HOME=$CONDA_PREFIX

Expected behavior

This should be possible to do without changing environment variables or be possible to specify in pixi.toml

EDIT i think this is bug with pytorch. However, it would be nice to set environment variable like something like this in pixi.toml https://github.com/pytorch/pytorch/issues/136845

ruben-arts commented 1 month ago

Thanks for the write up!

You can set the env var:

# in pyproject.toml add [tool.pixi....
[target.unix.activation.env]
CUDA_HOME="$CONDA_PREFIX"
[target.windows.activation.env]
CUDA_HOME="%CONDA_PREFIX%"

This should expose that env variable on all activations (run, shell, shell-hook) of the environment.

EdvardsZ commented 1 month ago

Thanks for the write up!

You can set the env var:
# in pyproject.toml add [tool.pixi....
[target.unix.activation.env]
CUDA_HOME="$CONDA_PREFIX"
[target.windows.activation.env]
CUDA_HOME="%CONDA_PREFIX%"
This should expose that env variable on all activations (run, shell, shell-hook) of the environment.

Thanks!

ruben-arts commented 1 month ago

Assuming that helped you fix it, I'll close the issue :smile:

prefix-dev / pixi