`-C target_cpu=cortex-a72` (and `-target-cpu=native` on Raspberry Pi) wrongly enables crypto features that are optional on Cortex-A72

briansmith commented 6 months ago

Building with RUSTFLAGS="-C target_cpu=cortex-a72" statically enables the target_feature="aes", target_feature="crc", target_feature="pmuv3", and target_feature="sha2". However, at least the crypto features AES, CRC, and SHA2 are optional on this CPU. The definition for this target is wrong. See the upstream LLVM bug: https://github.com/llvm/llvm-project/issues/90365.

The main consequence is that crypto libraries that use cfg(target_feature = ...) feature detection for these hardware instructions are getting miscompiled, causing the programs to, at best, crash with an illegal instruction exception. This particular affects Raspberry Pi users compiling with RUSTLFAGS=-target-cpu=native. From https://github.com/briansmith/ring/issues/1858#issuecomment-2093675988:

$ rustc --print cfg --target=aarch64-unknown-linux-gnu -C target_cpu=cortex-a72 | grep feature
target_feature="aes"
target_feature="crc"
target_feature="neon"
target_feature="pmuv3"
target_feature="sha2"

Without -C target_cpu=cortex-a72 we get the correct feature flags:

$ rustc --print cfg --target=aarch64-unknown-linux-gnu | grep feature
target_feature="neon"

I verified this is an issue on Rust 1.61 stable, 1.78 stable, and rustc 1.80.0-nightly (6e1d94708 2024-05-10).

Although some crypto libraries may work around this issue, these workarounds have negative consequences. In the case of ring's workaround, the result of the workaround is bloat and worse performance on all AArch64 CPUs that actually are guaranteed to have the crypto extensions (except on Fuchsia, Windows, and macOS).

apiraino commented 6 months ago

WG-prioritization assigning priority (Zulip discussion).

Since in our support list this arch has the most support:

@rustbot label -I-prioritize +P-critical

briansmith commented 6 months ago

From today's comment in the upstream LLVM issue:

For -mcpu=xyz, we enable the maximal set of features for the cpu (so long as they are relatively common), which can be disabled with +nofeat. [....] The idea is that users get decent performance by default, and if they have less features can turn down the default.

Unfortunately GCC didn't follow that scheme at the time for crypto instructions, and had them disabled by default. Clang did, so there was a difference in whether crypto was enabled. We did not decide to retro-actively change old CPU definitions (it could be a breaking change), but going forward "Armv-9" cpus have been changed to not include crypto by default.

Assuming that is accurate, there are a few interesting things:

Potentially the issue is much broader than Cortex-A* CPUs, as it seems like LLVM doesn't actually have a policy of only-required-features-by-default. I.e. the current behavior seems to be by design.
Assuming we want -C target_cpu to be safe by default, rustc cannot delegate its defaults to LLVM. rustc should turn off feature flags for optional features by default. Basically -C target_cpu is generally not a memory-safe option. This should be documented retroactively for old versions and fixed for newer versions.
Assuming cc-rs also wants to be "safe by default," it also needs to manually turn off feature flags for optional features by default when the compiler is clang, by passing additional flags.
This all would be a potentially-compatibility-breaking change.

briansmith commented 6 months ago

Also, from the LLVM issue:

For -march=armv8.x-a we enable the minimal set of features (so all required extensions). Optional features can be added with +feat.

So, one workaround would be to use that. But, I don't see rustc providing a mechanism to choose the ARM architecture level instead of a specific CPU.

wesleywiser commented 6 months ago

@rustbot label -P-critical +P-high

We aren't going to block the next stable release because of this issue so tagging this as P-high instead.

workingjubilee commented 5 months ago

Assuming we want -C target_cpu to be safe by default, rustc cannot delegate its defaults to LLVM. rustc should turn off feature flags for optional features by default. Basically -C target_cpu is generally not a memory-safe option. This should be documented retroactively for old versions and fixed for newer versions.

Mmh. If I run the Rust compiler with -Ctarget-cpu=some-intel-cpu and then execute it on my AMD machine, the result is (often) memory unsafe.

I do not believe it is realistic for rustc to "correct" LLVM on the very long list of AArch64 CPUs, so it would be best to fix this upstream.

briansmith commented 5 months ago

Mmh. If I run the Rust compiler with -Ctarget-cpu=some-intel-cpu and then execute it on my AMD machine, the result is (often) memory unsafe.

If you do -C target-cpu=skylake, and you run it on a skylake machine, do you get memory unsafety? That's the situation with -C target_cpu=cortex-a72 today.

mripard commented 5 months ago

Even worse, if you compile on a Pi4 with -C target_cpu=native, and execute it straight-away, it's unsafe.

workingjubilee commented 5 months ago

If you do -C target-cpu=skylake, and you run it on a skylake machine, do you get memory unsafety? That's the situation with -C target_cpu=cortex-a72 today.

If you build a binary with --target i686-unknown-linux-gnu and then run it on an actual Pentium 6, yes, you do.

briansmith commented 5 months ago

if you do -C target-cpu=skylake, and you run it on a skylake machine, do you get memory unsafety? That's the situation with -C target_cpu=cortex-a72 today.

If you build a binary with --target i686-unknown-linux-gnu and then run it on an actual Pentium 6, yes, you do.

That's a different situation, because you're building for a CPU with more capabilities than what you're running on. In the case of -C target_cpu=cortex-a72, when you run it on some cortex-a72 CPUs, i.e. the same model that you asked for, it will execute instructions that aren't implemented on that cortex-a72 CPU.

briansmith commented 5 months ago

I do not believe it is realistic for rustc to "correct" LLVM on the very long list of AArch64 CPUs, so it would be best to fix this upstream.

Understandable. However, I think it is worth making an exception for this particular case:

This is a very common issue because of the popularity of the Cotex-A72-based Raspberry Pi.
Someone in the upstream issue indicated that they've changed their policy for later CPUs, so this seems to be an issue that affects a short and not-growing list of CPUs. I.e. we probably wouldn't be signing up for doing this kind of thing over and over again.
Upstream indicates that it is hard for them to "fix" it upstream. Probably because it would affect how Android applications and Android OS itself is built, which makes it high-risk for them when those devices are in maintenance mode and they're supposed to be minimizing risk to them. So we shouldn't wait on a fix upstream.

workingjubilee commented 5 months ago

That's a different situation, because you're building for a CPU with more capabilities than what you're running on.

No, it is not.

When you build code for the P6, you get code for a P68, a CPU model that was revised and upgraded from the P6.

When you build code for the Cortex-A72, you get code for a revised and upgraded Cortex-A72 model.

This is a very common issue because of the popularity of the Cotex-A72-based Raspberry Pi.

I assure you, there will be other SBCs that people will say are very popular and demand that we fix this sort of thing for them.

Someone in the upstream issue indicated that they've changed their policy for later CPUs, so this seems to be an issue that affects a short and not-growing list of CPUs. I.e. we probably wouldn't be signing up for doing this kind of thing over and over again.

LLVM has been willing to break compatibility with target features. I don't see why why we would believe their target CPUs are any more stable.

mripard commented 5 months ago

That's a different situation, because you're building for a CPU with more capabilities than what you're running on.

No, it is not.

When you build code for the P6, you get code for a P68, a CPU model that was revised and upgraded from the P6.

I mean, it's not exactly the same thing. P6 and P68 are architectures. It covers CPU designs that have been released over the course of, what, 10 years?

When you build code for the Cortex-A72, you get code for a revised and upgraded Cortex-A72 model.

Meanwhile, the Cortex-A72 is a CPU design. ARM doesn't produce any, and licensees are free to implement optional features, at their discretion. And note that it's a single design.

And we can talk about whether the Cortex-A72 definition should include various options or not all we want, but the fact that native will produce binaries that can't run on the machine that compiled them, despite being exactly what it is used for and documented as, is clearly a bug.

This is a very common issue because of the popularity of the Cotex-A72-based Raspberry Pi.

I assure you, there will be other SBCs that people will say are very popular and demand that we fix this sort of thing for them.

If we stick to using only the required, by design, features of a given CPU design, then we won't have more demands.

workingjubilee commented 5 months ago

If we stick to using only the required, by design, features of a given CPU design, then we won't have more demands.

Splendid! Then if that is all we need to do, would you care to PR your alternative database of CPU features for aarch64 CPU models to rustc, since LLVM's cannot be relied on and they apparently are not interested in changing that?

nikic commented 5 months ago

It seems like the key problem here is that -C target-cpu=native does not work correctly -- regardless of any disagreements on naming, that part is certainly a bug in LLVM for which a fix would be accepted. This is probably just a matter of changing https://github.com/llvm/llvm-project/blob/22530e7985083032fe708848abb88b77be78e5ce/llvm/lib/TargetParser/Host.cpp#L1972 to always assign the crypto feature (instead of only assigning crypto = true).

workingjubilee commented 5 months ago

@nikic Hm. Shouldn't that be done for the Windows cases as well? Indeed, all of the cases like that, with a static name?

if (condition)
    Feature["name"] = true

workingjubilee commented 5 months ago

Hm, I suppose there's only one in that file outside the AArch64 set.

workingjubilee commented 2 months ago

heads up crabs, https://github.com/llvm/llvm-project/pull/95694 got merged, I'll let y'all figure out how to backport that or what.

briansmith commented 2 months ago

workingjubilee added llvm-fixed-upstream

This isn't fixed upstream though? Maybe the target-cpu=native case is fixed thanks to your effort but the target_cpu=cortex-a72 case is still an issue.

nikic commented 2 months ago

The -C target-cpu=native behavior was a plain bug. The -C target-cpu=cortext-a72 behavior is just a difference in opinion.

workingjubilee commented 1 month ago

@briansmith It's a separate issue because it applies to almost every single -Ctarget-cpu that isn't -Ctarget-cpu=native for AArch64. It requires a different solution, even if it is fixed upstream by successfully armwrestling LLVM into changing every single CPU def.

rust-lang / rust

`-C target_cpu=cortex-a72` (and `-target-cpu=native` on Raspberry Pi) wrongly enables crypto features that are optional on Cortex-A72 #125033