ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.83k stars 2.47k forks source link

aarch64 bare metal uses FP&SIMD registers when neon and fp are disabled #21473

Open gaosui opened 5 hours ago

gaosui commented 5 hours ago

Zig Version

0.13.0

Steps to Reproduce and Observed Behavior

Code generated for aarch64-freestanding-none uses the FP register Q0 when neon, fullfp16 and fp_armv8 are disabled.

Build command:

 zig build-exe -target aarch64-freestanding-none -mcpu cortex_a76-neon-fp_armv8-fullfp16 -fentry=main -femit-llvm-ir example.zig

Example code:

const Object = struct {
    // At least two 64-bit fields are required to trigger this.
    a: u64,
    b: u64,

    // First function lets compiler to decide passing self by copy or pointer.
    fn func1(self: Object) void {
        // Just to make use of all fields on self.
        const sum = self.a + self.b;

        // !!!!! Assembly before calling func2 uses fp register q0.
        _ = self.func2(sum);
    }

    // Second function specifies passing self by pointer.
    fn func2(self: *const Object, val: u64) u64 {
        // Make use of self and the input val.
        const result = val + self.a + self.b;
        return result;
    }
};

// Export to make linker discover it as bare metal entry point.
export fn main() void {
    const obj = Object{
        .a = 1,
        .b = 2,
    };

    obj.func1();
}

Assembly before the call to func2 uses q0:

ldr    q0, [x8]
sub    x0, x29, #0x10
stur   q0, [x29, #-16]
bl      10103c0 <example.Object.func2>

The use of FP register might have originated from llvm memcpy

call void @llvm.memcpy.p0.p0.i64(ptr align 8 %1, ptr align 8 %0, i64 16, i1 false), !dbg !280
%10 = call fastcc i64 @example.Object.func2(ptr nonnull readonly align 8 %1, i64 %9), !dbg !281

Expected Behavior

Expected ARM FP registers Qn, Dn, Sn, Hn, Bn to be avoided when FP&SIMD features are excluded from the target CPU.

However I cannot claim that the presence of FP registers is defined by these features. For example in #16957, a neon instruction is used regardless of the feature specification. In my case only FP registers are used, not instructions. It could be the intended behavior after all, that excluding these features don't forbit the use of FP registers in non-FP instructions.

alexrp commented 3 hours ago

Just to be clear: Is this causing an actual problem?

The way I'm interpreting the Arm v8-A manual, the FP/SIMD registers are architecturally required even if FEAT_FP/FEAT_AdvSIMD aren't available.