Open traversaro opened 3 months ago
Related Discord message: https://discord.com/channels/1082332781146800168/1082338253925003385/1272560017920360488 .
I have been struggling with this for a while. SIMD support is dictated by the microarchitecture of someone's system reported by the archspec library which is exposed as the __archspec
virtual package. However, this reports the "exact" architecture of your CPU, e.g. "zen2" or "skylake" or "m1". These don't directly tell you whether a system is capable of certain SIMD operations. Extra meta packages are required to link the architecture name to certain capabilities. In conda-forge this is provided through the microarch
package (more info here).
But from a pixi point of view, what would you insert in your pixi.toml file as the minimum system requirement? skylake
, zen2
? If I now run on haswell
, what happens? That makes little sense to me.
But actually, the archspec spec is designed as a graph where certain architectures "inherit" the capabilities of other architectures. So zen4
inherits the capabilities of zen2
and adds some on top. There are also some "virtual" architectures like "x86_64_v3". skylake
inherits its capabilities from that architecture (indirectly).
That is also roughly the idea behind the microarch
package which defines version 1,2,3,4
to provide at least the capabilities of the architectures x86_64_v1
, x86_64_v2
, etc (for the x86 platforms). Mamba, which doesn't implement the full archspec spec (only pixi and conda do), simply looks at the CPU capabilities and reverse engineers that to one of those generic architecture names. Quite clever!
So what we could do is have a system requirement that defines the lowest possible architecture that your project requires. E.g. this can be x86_64_v3
or skylake
which will correspond to the __archspec
identifier used for solving. At runtime, we figure out whether your actual architecture inherits from the specified minimum to determine if your machine supports the minimum requirement.
Alternatively, we could use the same levels as the microarch
packages as minimum requirements. E.g. you specify microarch = 2
as the minimum requirement. But that simply boils down to the same thing internally.
I do feel like defaulting to microarch=3
might be a bit too rigorous atm. Perhaps level 1, which I believe provides a minimum of SSE2, is a more sane default.
There are still some difficulties, for instance, what if your project supports both x86 and Apple silicon? microarch=x86_64_v3
or microarch=3
make no sense in that case! There is no SSE on Apple silicon. Perhaps we should set a minimum requirement per base architecture (also similar to the microarch-level
package)? x86
, powerpc
etc?
@ruben-arts Thoughts?
I do feel like defaulting to
microarch=3
might be a bit too rigorous atm.
Just to understand, which x86_64 system do you have in mind that do not support AVX/AVX2? What I like about an "aggressive" default system requirement is that people that do not have it get an error (on which you can do an action based on the error), while if you use a "conservative" default system requirement people will just silently get slower packages, and it may not be obvious for them to understand that.
some reference for the microarchitecture targets (v2 for RHEL9/clones v3 for RHEL10)
Thanks for sharing @truatpasteurdotfr
I propose the following:
At the moment, if no system requirements are present we assume the default. However, this prevents us from changing the defaults in the future without potentially breaking resolving. I propose we add the system requirements that were used for solving the lock-file.
Proposal
I propose we modify the lock-file format to include the resolved system requirements that were used to solve the packages. If system-requirements are missing from the lock-file we should assume the defaults before this change was introduced.
System requirements in the lock file are defined on 6 different levels to reduce the size of the lock-file:
Globally for the entire lock-file
base
defines the system requirements that are shared by all platforms.linux
, osx
, etc.)linux-64
, linux-aarch64
).And per environment
base
defines the system requirements that are shared by all platforms.linux
, osx
, etc.)linux-64
, linux-aarch64
).Only the system requirements that apply to a certain set have to be defined and only the if they override a previous level.
Forward-compatibility (e.g. older versions of pixi should still be able to read lock-files created by newer versions) is retained, older versions will keep using their default system requirements. Backward compatibility is maintained because if system requirements are missing we assume the defaults from older pixi versions.
We did change the default system requirements at some point in a breaking way. This change does not address this.
System requirements are defined as follows:
macos: # Only applies to macos based operation systems
version: 13.0 # The minimum macos version (not that the default differs based on the architecture)
linux: # Only applies to linux based operating systems
version: 5.10 # The minimum linux kernel version
libc: # Only applies to linux based operating systems. Must be `null` if no libc is available.
family: glibc # The libc family
version: 2.28 # The minimum available version
cuda: # Minimum cuda available, or `null` if no cuda is present.
version: 12.4
Questions
We do not currently store a minimum available architecture. I propose we add a platform-specific base architecture system requirement based on the archspec spec. At runtime two things must happen:
__archspec
virtual package.Following step 1, we can also set an initial default.
Proposal
In the pixi.toml we add:
[system-requirement.archspec]
# Define on architecture family level
x86_64 = "x86_64_v3" # Its best to use these "virtual architectures" but we could also allow specific architectures?
ppc64le = "power8le"
# Or, overwrite on a per-platform level
osx-64 = "m2"
In the lock-file this would roughly translate to:
system-requirements:
win-64:
archspec: x86_64_v3
linux-64:
archspec: x86_64_v3
osx-64:
archspec: m2
linux-ppc64le:
archspec: power8le
I would love to hear your thoughts!
It seems great, thanks!
Thanks @baszalmstra!
Step 1: Instant yes! This would be great, especially because we need it to quickly verify if the solve was done for exactly that system already or if we could ask the user for a re-solve with their settings.
Step 2: This seems like a good idea. But what I'm worried about is the blowup of environments, as you probably don't have the requirement but you just want to get the best package. So for me the user story would be: "As a user of a package, I want pixi to install the fastest available version of said package, so that I can make sure my system runs optimal."
If pixi would be able to lock environments but dynamically choose which version of archspec package to install that would be really powerful as we would automate all of this.
Here is an solution idea which probably has lots of missing parts but I'm writing it down to hopefully explain myself better.
linux-64:
- conda
- archspec:
- x86_64_v3: https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.1-x86_64_v3.conda
- x86_64_v2: https://conda.anaconda.org/conda-forge/linux-64/openssl-3.3.1-x86_64_v2.conda
I know this won't work because package could have different metadata but the automation would be important to me. We already see what a bad UX the current system-requirements are that I would like to avoid more of the same.
I've got this idea while talking to someone at PyCon DE 2024. They had a package the included the binaries for all archspecs and the main binary would be a script that checked the runtime archspec and spawned the right binary for it. This way the package was not able to have different metadata but did dynamically choose the correct binary.
This is something you obviously don't want but it was a way to automate it for them.
Thats much more difficult than it seems from on surface. Each of those packages can have different metadata which could result in a wildly different resolution. Its not just one package that might change but also all transitive dependencies.
This means we would get a lock-file that still requires solving at runtime, be it with a much smaller set of packages. We called this hydrated vs non-hydrated locking.
I think this would be interesting but I think its way out of scope for this proposal.
That makes sense, I immediately got worried about the pixi UX but the technology can of course be build already.
Will keep the dynamic lock files on a background thread in my brain for the now.
naive question: how archspec and macos/rosetta emulation for x64 on aarch64 would interact?
naive question: how archspec and macos/rosetta emulation for x64 on aarch64 would interact?
@truatpasteurdotfr I guess it depends on which instructions are exposed by rosetta. As long as the archspec Python and Rust library and the C++ detection code in micromamba work fine, everything should work fine. Indeed by checking the archspec python repo there is a related issue: https://github.com/archspec/archspec/issues/133 . The Rust port of archspec has some code to deal with that case, so I think it should be good to go? See https://github.com/prefix-dev/archspec-rs/blob/ff45a9b4a2bc484d5c27f650f4fe9940fd302731/src/cpu/detect.rs#L347-L348 .
Problem description
The
conda-forge
channel has started providing microarchitecture-enabled builds (i.e. packages that exploit AVX2 capabilities if they are available in the machine by installing a different version of the package), see:At the moment, even if the host machine supports AVX2, pixi always install the non-AVX2 version of packages, for example if you install libblaseo from https://github.com/conda-forge/blasfeo-feedstock, you always get the non-AVX2 build
libblasfeo-0.1.3-hc1b4afe_101
instead oflibblasfeo-0.1.3-hfd42a93_301
, while conda installslibblasfeo-0.1.3-hfd42a93_301
if the machine supports AVX2 .