prefix-dev / pixi

Package management made easy
https://pixi.sh
BSD 3-Clause "New" or "Revised" License
3.23k stars 177 forks source link

Proposal: Support for microarchitecture-enabled builds #1804

Open traversaro opened 2 months ago

traversaro commented 2 months ago

Problem description

The conda-forge channel has started providing microarchitecture-enabled builds (i.e. packages that exploit AVX2 capabilities if they are available in the machine by installing a different version of the package), see:

At the moment, even if the host machine supports AVX2, pixi always install the non-AVX2 version of packages, for example if you install libblaseo from https://github.com/conda-forge/blasfeo-feedstock, you always get the non-AVX2 build libblasfeo-0.1.3-hc1b4afe_101 instead of libblasfeo-0.1.3-hfd42a93_301, while conda installs libblasfeo-0.1.3-hfd42a93_301 if the machine supports AVX2 .

traversaro commented 2 months ago

Related Discord message: https://discord.com/channels/1082332781146800168/1082338253925003385/1272560017920360488 .

baszalmstra commented 2 months ago

I have been struggling with this for a while. SIMD support is dictated by the microarchitecture of someone's system reported by the archspec library which is exposed as the __archspec virtual package. However, this reports the "exact" architecture of your CPU, e.g. "zen2" or "skylake" or "m1". These don't directly tell you whether a system is capable of certain SIMD operations. Extra meta packages are required to link the architecture name to certain capabilities. In conda-forge this is provided through the microarch package (more info here).

But from a pixi point of view, what would you insert in your pixi.toml file as the minimum system requirement? skylake, zen2? If I now run on haswell, what happens? That makes little sense to me.

But actually, the archspec spec is designed as a graph where certain architectures "inherit" the capabilities of other architectures. So zen4 inherits the capabilities of zen2 and adds some on top. There are also some "virtual" architectures like "x86_64_v3". skylake inherits its capabilities from that architecture (indirectly).

That is also roughly the idea behind the microarch package which defines version 1,2,3,4 to provide at least the capabilities of the architectures x86_64_v1, x86_64_v2, etc (for the x86 platforms). Mamba, which doesn't implement the full archspec spec (only pixi and conda do), simply looks at the CPU capabilities and reverse engineers that to one of those generic architecture names. Quite clever!

So what we could do is have a system requirement that defines the lowest possible architecture that your project requires. E.g. this can be x86_64_v3 or skylake which will correspond to the __archspec identifier used for solving. At runtime, we figure out whether your actual architecture inherits from the specified minimum to determine if your machine supports the minimum requirement.

Alternatively, we could use the same levels as the microarch packages as minimum requirements. E.g. you specify microarch = 2 as the minimum requirement. But that simply boils down to the same thing internally.

I do feel like defaulting to microarch=3 might be a bit too rigorous atm. Perhaps level 1, which I believe provides a minimum of SSE2, is a more sane default.

There are still some difficulties, for instance, what if your project supports both x86 and Apple silicon? microarch=x86_64_v3 or microarch=3 make no sense in that case! There is no SSE on Apple silicon. Perhaps we should set a minimum requirement per base architecture (also similar to the microarch-level package)? x86, powerpc etc?

@ruben-arts Thoughts?

traversaro commented 2 months ago

I do feel like defaulting to microarch=3 might be a bit too rigorous atm.

Just to understand, which x86_64 system do you have in mind that do not support AVX/AVX2? What I like about an "aggressive" default system requirement is that people that do not have it get an error (on which you can do an action based on the error), while if you use a "conservative" default system requirement people will just silently get slower packages, and it may not be obvious for them to understand that.

truatpasteurdotfr commented 2 months ago

some reference for the microarchitecture targets (v2 for RHEL9/clones v3 for RHEL10)

baszalmstra commented 2 months ago

Thanks for sharing @truatpasteurdotfr

baszalmstra commented 2 months ago

I propose the following:

Step 1: Adding the system requirements to the lock file.

At the moment, if no system requirements are present we assume the default. However, this prevents us from changing the defaults in the future without potentially breaking resolving. I propose we add the system requirements that were used for solving the lock-file.

Proposal

I propose we modify the lock-file format to include the resolved system requirements that were used to solve the packages. If system-requirements are missing from the lock-file we should assume the defaults before this change was introduced.

System requirements in the lock file are defined on 6 different levels to reduce the size of the lock-file:

Globally for the entire lock-file

  1. base defines the system requirements that are shared by all platforms.
  2. operating system specific (e.g. linux, osx, etc.)
  3. platform specific (linux-64, linux-aarch64).

And per environment

  1. base defines the system requirements that are shared by all platforms.
  2. operating system specific (e.g. linux, osx, etc.)
  3. platform specific (linux-64, linux-aarch64).

Only the system requirements that apply to a certain set have to be defined and only the if they override a previous level.

Forward-compatibility (e.g. older versions of pixi should still be able to read lock-files created by newer versions) is retained, older versions will keep using their default system requirements. Backward compatibility is maintained because if system requirements are missing we assume the defaults from older pixi versions.

We did change the default system requirements at some point in a breaking way. This change does not address this.

System requirements are defined as follows:

macos:           # Only applies to macos based operation systems
  version: 13.0   # The minimum macos version (not that the default differs based on the architecture)
linux:               # Only applies to linux based operating systems
  version: 5.10   # The minimum linux kernel version
libc:                    # Only applies to linux based operating systems. Must be `null` if no libc is available.
  family: glibc    # The libc family
  version: 2.28   # The minimum available version
cuda:                # Minimum cuda available, or `null` if no cuda is present.
  version: 12.4  

Questions

Step 2: Adding the archspec system requirement

We do not currently store a minimum available architecture. I propose we add a platform-specific base architecture system requirement based on the archspec spec. At runtime two things must happen:

Following step 1, we can also set an initial default.

Proposal

In the pixi.toml we add:

[system-requirement.archspec]

# Define on architecture family level
x86_64 = "x86_64_v3"    # Its best to use these "virtual architectures" but we could also allow specific architectures?
ppc64le = "power8le"

# Or, overwrite on a per-platform level
osx-64 = "m2"

In the lock-file this would roughly translate to:

system-requirements:
  win-64:
    archspec: x86_64_v3
  linux-64:
    archspec: x86_64_v3
  osx-64: 
    archspec: m2
  linux-ppc64le:
    archspec: power8le

I would love to hear your thoughts!

traversaro commented 2 months ago

It seems great, thanks!

ruben-arts commented 2 months ago

Thanks @baszalmstra!

Step 1: Instant yes! This would be great, especially because we need it to quickly verify if the solve was done for exactly that system already or if we could ask the user for a re-solve with their settings.

Step 2: This seems like a good idea. But what I'm worried about is the blowup of environments, as you probably don't have the requirement but you just want to get the best package. So for me the user story would be: "As a user of a package, I want pixi to install the fastest available version of said package, so that I can make sure my system runs optimal."

If pixi would be able to lock environments but dynamically choose which version of archspec package to install that would be really powerful as we would automate all of this.

Here is an solution idea which probably has lots of missing parts but I'm writing it down to hopefully explain myself better.

      linux-64:
      - conda
        - archspec:
            - x86_64_v3: https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.1-x86_64_v3.conda
            - x86_64_v2: https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.1-x86_64_v2.conda

I know this won't work because package could have different metadata but the automation would be important to me. We already see what a bad UX the current system-requirements are that I would like to avoid more of the same.

I've got this idea while talking to someone at PyCon DE 2024. They had a package the included the binaries for all archspecs and the main binary would be a script that checked the runtime archspec and spawned the right binary for it. This way the package was not able to have different metadata but did dynamically choose the correct binary.

This is something you obviously don't want but it was a way to automate it for them.

baszalmstra commented 2 months ago

Thats much more difficult than it seems from on surface. Each of those packages can have different metadata which could result in a wildly different resolution. Its not just one package that might change but also all transitive dependencies.

This means we would get a lock-file that still requires solving at runtime, be it with a much smaller set of packages. We called this hydrated vs non-hydrated locking.

I think this would be interesting but I think its way out of scope for this proposal.

ruben-arts commented 2 months ago

That makes sense, I immediately got worried about the pixi UX but the technology can of course be build already.

Will keep the dynamic lock files on a background thread in my brain for the now.

truatpasteurdotfr commented 2 months ago

naive question: how archspec and macos/rosetta emulation for x64 on aarch64 would interact?

traversaro commented 3 days ago

naive question: how archspec and macos/rosetta emulation for x64 on aarch64 would interact?

@truatpasteurdotfr I guess it depends on which instructions are exposed by rosetta. As long as the archspec Python and Rust library and the C++ detection code in micromamba work fine, everything should work fine. Indeed by checking the archspec python repo there is a related issue: https://github.com/archspec/archspec/issues/133 . The Rust port of archspec has some code to deal with that case, so I think it should be good to go? See https://github.com/prefix-dev/archspec-rs/blob/ff45a9b4a2bc484d5c27f650f4fe9940fd302731/src/cpu/detect.rs#L347-L348 .