prefix-dev / pixi

Package management made easy
https://pixi.sh
BSD 3-Clause "New" or "Revised" License
3.29k stars 183 forks source link

Solve group but don't force the same version for specific dependencies / support hierarchical environment specifications #1138

Open pavelzw opened 7 months ago

pavelzw commented 7 months ago

Problem description

In Deep Learning, you sometimes have the use case that you want to test dependencies using solve groups but want to support both cpu and gpu configurations. Since there is a pytorch=2.1.12=cuda... and a pytorch=2.1.12=cpu... package, this is not possible with solve-groups at the moment.

[project]
name = "dl-project"
channels = ["conda-forge"]
platforms = ["linux-64"]

[dependencies]
pytorch = "2.1.2"

[feature.gpu.system-requirements]
cuda = "12.1"
[feature.gpu.dependencies]
pytorch = {build = "cuda*"}

[feature.cpu]
platforms = ["linux-64", "osx-arm64"]
[feature.cpu.dependencies]
pytorch = {build = "cpu*"}

[feature.dev.dependencies]
pytest = "*"
ruff = "*"

[environments]
dev-cpu = {features = ["cpu", "dev", "test"], solve-group = "cpu"}
prod-cpu = {features = ["cpu"], solve-group = "cpu"}

dev-gpu = {features = ["gpu", "dev", "test"], solve-group = "gpu"}
prod-gpu = {features = ["gpu"], solve-group = "gpu"}

This example has the problem that the GPU environments might behave differently than the CPU environments. Therefore, we would like them to be in the same solve-group. This will fail unfortunately, because in one env, pytorch 2.1.12 *cpu is requested and in the other, pytorch 2.1.12 *cuda is requested.

One solution I could think of would be the following:

[solve-groups.prod]
ignore-dependencies = ["pytorch"]

[environments]
dev-cpu = {features = ["cpu", "dev", "test"], solve-group = "prod"}
prod-cpu = {features = ["cpu"], solve-group = "prod"}

dev-gpu = {features = ["gpu", "dev", "test"], solve-group = "prod"}
prod-gpu = {features = ["gpu"], solve-group = "prod"}

Which forces everything to be in the same solve-group except for pytorch. I'm not 100% sure if this would be easily implementable with the current "put every requirement into one superset" approach, though.

WDYT? Do you have better ideas? I think solving this would be a great improvement for deep learners.

pavelzw commented 7 months ago

Might be slightly related to this discussion https://github.com/prefix-dev/pixi/pull/584#discussion_r1433812282

baszalmstra commented 7 months ago

We could also experiment with tightly integrating multiple dependend environments in the solver perhaps. 🤔

We could also use that when solving multiple platforms..

pavelzw commented 7 months ago

Environments depending on each other could solve the issue like this:

[environments]
_prod = {features = [], solve-group = "prod"}
_dev = {features = ["dev", "test"], solve-group = "prod"}

dev-cpu = {features = ["cpu", "dev", "test"], constraints = "_dev"} # maybe make constraints a list?
prod-cpu = {features = ["cpu"], constraints = "_prod"}

dev-gpu = {features = ["gpu", "dev", "test"], constraints = "_dev"}
prod-gpu = {features = ["gpu"], constraints = "_prod"}

or

[environments]
_prod = {features = [], solve-group = "prod"}
_dev = {features = ["dev", "test"], solve-group = "prod"}

dev-cpu = {features = ["cpu"], depends-on = "_dev"}
prod-cpu = {features = ["cpu"], depends-on = "_prod"}

dev-gpu = {features = ["gpu"], depends-on = "_dev"}
prod-gpu = {features = ["gpu"], depends-on = "_prod"}
pavelzw commented 7 months ago

The advantage of the hierarchies is that it's intuitive how it's actually solved/solvable.

Also, ignore-dependencies only works when the amount of dependencies is small; and maybe some stuff like libcublas or libtorch... also needing to be specified is already too much.

pavelzw commented 7 months ago

@baszalmstra do you mean with "tightly integrating multiple dependend environments" ensuring that "premature locking" doesn't happen? (i.e. lock a package in environment "a" that conflicts with another package in an environment that depends on environment "a" resulting in "not solvable")

baszalmstra commented 7 months ago

Currently we solve all environments independently. But I think with some engineering work we could solve them together, e.g. in one solve solve for multiple environments. Then we can for example guarantee that all (or some) packages share the same version but might have different build strings.

Definitely future talk but Im quite sure its possible.

tdejager commented 7 months ago

@baszalmstra to clarify the first part of your statement. If you have environment a and b that are part of the solve-group foo. Then we solve once for foo and a and b pull out what they need, if I understand correctly? So I would already call that a single solve or maybe more accurately, resolution, per environment.

However, the second part of the statement holds true and I understand what you are saying.

baszalmstra commented 7 months ago

Yes correct.

msegado commented 3 months ago

I've been mulling over an idea which might address both this use case (CPU + CUDA) and one I brought up on Discord a while back (wanting to ensure different platforms' environments are as similar as possible with respect to package versions):

When setting up the dependency graph, separate each package into at least two nodes: one 'abstract' node with just a version number (no platform or build string), and one-or-more 'concrete' nodes which include the platform/build string. An abstract node's only dependencies are concrete versions of itself; and, concrete nodes in turn depend on abstract versions of other packages.


Here's a contrived solve tree for the CPU/GPU case, ignoring platforms for now; abstract nodes are white, concrete nodes are shaded, and the __cuda virtual package has a dashed outline:

image

Once you have the solution for everything together, you traverse the tree starting at each feature or environment, keeping and locking the concrete nodes you encounter:

image

Here's another contrived example for the multi-platform use case. All three platforms include a dependency on seaborn here (the abstract node assuming we're not explicitly specifying a build string as in the cpu/gpu example):

image

To lock each platform, you walk the tree from each root as in the CPU/GPU case, but with one added rule: only visit nodes corresponding to the platform you're locking.


The downside of this approach is that you increase the depth of the solve tree (by up to 2x the way I described it above, though less if you don't bother with abstract nodes for noarch/ packages), and you also increase the total node count. But, you only have one solve tree instead of one per platform, and it may benefit from the fact that multiple concrete nodes can depend on a single abstract one (as is the case with libcxx above).

You might also run into scenarios where different platforms' trees request incompatible versions of the same package. In this case, you could resolve the conflict by pruning the abstract node and depending directly on the platform-specific concrete versions.

Thoughts? Am I missing something that would make this idea infeasible, or does it seem like it could actually work?

baszalmstra commented 3 months ago

I think this could potentially work but would require significant work on the solver. Especially solving everything together might cause some issues. Ill keep thinking about this.

msegado commented 3 months ago

Makes sense, yes 🙂 I took a look at the codebase before deciding my Rust skills weren't up for it yet...

If it helps, a quick way to test feasibility might be to run the existing solver on an artificial repodata.json with packages constructed with the following pseudocode:

Then just specify some dependencies and an arbitrary platform (which won't matter since everything is now pretending to be noarch) and try to solve -- which will implicitly solve all three platforms together, albeit without the solver realizing it's doing so. Then see how badly it blows up and decide whether it's worth digging further based on that.

For what it's worth, my [very uninformed] suspicion is that solving together won't hugely increase the difficulty of the satisfiability problem... in most cases I'd expect the abstract dependency constraints to either be the same between platforms (e.g. for things like numpy) or completely separate subtrees (for things like platform-specific compilers or feature-specific frameworks). The main type of conflict I'd expect is when the transitive dependencies of some subtree end up being incompatible -- though there's still the option of falling back on the old strategy of solving those subtrees separately in that scenario.