llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.01k stars 11.95k forks source link

acle-fmv-features.c fails on Fedora #109304

Closed nikic closed 1 month ago

nikic commented 1 month ago

This test from llvm-test-suite (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/AArch64/acle-fmv-features.c) fails on Fedora with the following output:


flagm
flagm2
dotprod
sha3
rdm
lse
sha2
sha1
aes
pmull
rcpc
rcpc2
fcma
jscvt
dpb
dpb2
bf16
i8mm
dit
fp16
ssbs2
    UPASS
bti
simd
fp
crc
sme
sme2

Note the UPASS on ssbs2. This indicates that the ssbs feature was not detected as available, but executing the assembly code for it did not crash. The assembly is (https://github.com/llvm/llvm-test-suite/blob/e77ce1bfc8dbe38b9bf5a3098cb954e58d125020/SingleSource/UnitTests/AArch64/acle-fmv-features.c#L189-L195):

    asm volatile (
        "mrs x0, SSBS" "\n"
        "msr SSBS, x0" "\n"
        : : : "x0"
    );

I'm not sure whether this indicates some kind of kernel bug where a trap is incorrectly suppressed, or an incorrect assumption in the test that this necessarily has to trap (rather than, say, being silently ignored).

Failure was observed on at least 6.10.10-100.fc39.aarch64 and 6.10.9-200.fc40.aarch64 kernels.

llvmbot commented 1 month ago

@llvm/issue-subscribers-backend-aarch64

Author: Nikita Popov (nikic)

This test from llvm-test-suite (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/AArch64/acle-fmv-features.c) fails on Fedora with the following output: ``` flagm flagm2 dotprod sha3 rdm lse sha2 sha1 aes pmull rcpc rcpc2 fcma jscvt dpb dpb2 bf16 i8mm dit fp16 ssbs2 UPASS bti simd fp crc sme sme2 ``` Note the `UPASS` on ssbs2. This indicates that the `ssbs` feature was not detected as available, but executing the assembly code for it did not crash. The assembly is (https://github.com/llvm/llvm-test-suite/blob/e77ce1bfc8dbe38b9bf5a3098cb954e58d125020/SingleSource/UnitTests/AArch64/acle-fmv-features.c#L189-L195): ```c asm volatile ( "mrs x0, SSBS" "\n" "msr SSBS, x0" "\n" : : : "x0" ); ``` I'm not sure whether this indicates some kind of kernel bug where a trap is incorrectly suppressed, or an incorrect assumption in the test that this necessarily has to trap (rather than, say, being silently ignored).
efriedma-quic commented 1 month ago

From the manual: "This register is present only when FEAT_SSBS is implemented. Otherwise, direct accesses to SSBS are UNDEFINED." So there's probably something wrong with the feature detection (maybe a bug in the code that parses cpuinfo).

labrinea commented 1 month ago

The feature detection uses hwcaps (see https://github.com/llvm/llvm-project/blob/9e7315912656628b606e884e39cdeb261b476f16/compiler-rt/lib/builtins/cpu_model/aarch64/fmv/mrs.inc#L48), which supposedly checks that ID_AA64PFR1_EL1.SSBS == 0b0010. It could be that the hardware it run on had ID_AA64PFR1_EL1.SSBS == 0b0001, where the register is still available.

labrinea commented 1 month ago

It's perhaps my bad because of this patch https://github.com/llvm/llvm-project/pull/95149 (FYI @Wilco1)

labrinea commented 1 month ago

Actually even if we detected ssbs the test would still fail I think? I am not sure. Perhaps we need a different test for ssbs2 and use the existing one for ssbs.

andrewcarlotti commented 1 month ago

It could be that the hardware it run on had ID_AA64PFR1_EL1.SSBS == 0b0001, where the register is still available.

Accesses to the register with the MSR/MRS instructions are not possible without FEAT_SSBS2. The latest Arm ARM still says FEAT_SSBS, but the correction appears in the published list of known issues.

I suspect that the test failures might result from the kernel masking out support for the feature bit, without trapping the instructions. In particular, I notice that the feature is masked out when ARM64_WORKAROUND_SPECULATIVE_SSBS is set (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c#n2314). This workaround affects Cortex-X4 and Neoverse-V3 cores.

nikic commented 1 month ago

I suspect that the test failures might result from the kernel masking out support for the feature bit, without trapping the instructions. In particular, I notice that the feature is masked out when ARM64_WORKAROUND_SPECULATIVE_SSBS is set (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c#n2314). This workaround affects Cortex-X4 and Neoverse-V3 cores.

In the log I see "CPU features: detected: SSBS not fully self-synchronizing", so it looks like ARM64_WORKAROUND_SPECULATIVE_SSBS is indeed active here.

labrinea commented 1 month ago

Could the value of the PSTATE.SSBS bit read by mrs be of any use here?

Wilco1 commented 1 month ago

Could the value of the PSTATE.SSBS bit read by mrs be of any use here?

I don't think that makes a difference. You could try ID_AA64PFR1_EL1.SSBS but the kernel might have set it back to 1 as well.

Either way, this proves you can't write these negative tests since the kernel can disable features. As this example shows, it doesn't mean underlying instructions will always trap - it just means the feature is turned off by the kernel. As long as we correctly detect that, the test should pass.

nikic commented 1 month ago

So is the conclusion here that we should just not check the UPASS case for the ssbs2 feature?

labrinea commented 1 month ago

Yes, something like that. I see value in keeping the negative tests for other features so perhaps create an exception for SSBS?