Allow overriding detected virtual packages

twrightsman commented 9 months ago

Problem description

I've run into another interesting issue that Conda can partially handle and pixi can't: I have to create an environment to be activated on a machine without internet access. The only machines with CUDA installed are those without internet access, so I can't satisfy the __cuda virtual package requirement on the nodes with internet access.

I was able to get around this using Conda's virtual package overrides, but it would be awesome if pixi had something like this too.

wolfv commented 9 months ago

You should be able to use the system-dependencies configuration in your pixi.toml to control the virtual packages:

https://prefix.dev/docs/pixi/configuration#the-system-requirements-table

twrightsman commented 9 months ago

This does not work if the internet-connected host does not have CUDA available:

$ pixi install
  × Cannot solve the request because of: pytorch 2.0.* cuda* cannot be installed because there are no viable options:
  │ |-- pytorch 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 |
  │ 2.0.0 | 2.0.0 | 2.0.0 | 2.0.0 would require
  │     |-- __cuda *, for which no candidates were found.

[system-requirements]
linux = "3.10.0"

Essentially I'm imagining a way to specify the platform I know the environment will be run in, not necessarily the platform I'm populating the environment in.

wolfv commented 9 months ago

But if you do cuda = "12.0'?

twrightsman commented 9 months ago

Then I get:

$ pixi install
  × The platform you are running on should at least have the virtual package __cuda on version 12.0, build_string: 0

wolfv commented 9 months ago

OK, that's a problem. We do want to allow to specify a "matrix" of supported platforms and pixi would select the best one at installation time. So e.g. you could have one config for just linux-64 and one for linux-64 + cuda 12.0 etc.

twrightsman commented 9 months ago

But what if the platform at installation time is different than the platform at run time? In my case, the cluster I am using has the CUDA machines disconnected from the internet, and the internet-connected machines do not have CUDA.

If I understand correctly, the support matrix still would not allow a non-CUDA machine to install a CUDA environment?

kszlim commented 7 months ago

I'm running into this issue too, would be very helpful to have a solution.

twrightsman commented 7 months ago

For what it's worth, I was imagining a simple first solution could be similar to Conda, where an environment variable PIXI_OVERRIDE_CUDA=11.2 could force the __cuda virtual package version to 11.2, skipping the actual version detection.

One could also imagine PIXI_OVERRIDE_BLAH to override any virtual package __blah, but we don't have to get carried away. :slightly_smiling_face:

twrightsman commented 6 months ago

The 0.13.0 release triggered me to think further on this. I see three ways to achieve this, in order of decreasing "cleanliness":

Allow specifying the runtime platform rather than assuming the platform that creates the environment is also the runtime one, likely in pixi.toml
Use PIXI_OVERRIDE_* to manually set virtual packages to versions the user knows are present on the runtime platform
Have a command-line flag that tells Pixi to ignore virtual package version constraints (with a big, red warning)

Curious about other people's thoughts. This problem seems very similar to the cross-compiling problem, where packages are built for targets different than the build machine.

msegado commented 6 months ago

I have a similar use case, though it's not a hard blocker in my case since the GPU nodes in my cluster do have network access. It would still be nice to create a CUDA-enabled environment from a login node and only use the GPU nodes for actual jobs, though.

Also unsure of the cleanest way to do this. (The idea crossed my mind of suggesting Pixi check for CONDA_OVERRIDE_* in its virtual environment detection logic, but that means another part of the Conda API bleeds into the Pixi API, and then it needs to be documented and kept compatible with whatever Conda does for the foreseeable future to avoid breaking people's workflows once they start depending on it...)

183amir commented 2 months ago

Is there any progress on this? I have the same problem. I am using a login node (wihtout GPU) to create my env and test my code and then send the code for execution on a GPU machine. But I get this error on my login code:

$ pixi run -e cuda ...
  × The platform you are running on should at least have the virtual package __cuda on version 12, build_string: 0

Most GPU packages like pytorch and tensorflow run fine on a machine without a GPU so I want to be able to use one environment on both machines.

baszalmstra commented 4 days ago

Progress has been made on this in rattler.

prefix-dev / pixi

Allow overriding detected virtual packages #480

Problem description