ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.82k stars 2.47k forks source link

powf not inlining or optimizing in builtin implementation (cf. Rust, clang) #12032

Open wallabra opened 2 years ago

wallabra commented 2 years ago

Zig Version

0.9.0 (Godbolt)

Steps to Reproduce

This behaviour can be verified e.g. by implementing the generic Minkowski distance and compiling the instances with order 1 (equivalent to Manhattan distance) and order 2 (equivalent to Euclidean distance). Make sure to enable optimizations with fast math.

It can be done in Godbolt:

In this case, compare the deassembled outputs.

Expected Behavior

std.math.powf should be inlined and optimized out where applicable (i.e. where the compiler can guarantee safe behaviour, and thus simply use the equivalent of libm's powf).

The first- and second-order instances of the generic Minkowski function should be automatically optimized into specific algorithms by the compiler. This can be most easily verified by checking that the assembly versions of both cases are short and don't contain call, and that only the 2nd-order case contains vsqrtss (since in the first-order case, all powf calls are passed an exponent of 1 and can be safely optimized out).

Actual Behavior

In Zig 0.9.0, the Minkowski function is neither inlined, much less optimized, in the specific cases of orders 1 and 2, as a result of the compiler failing to inline and optimize out std.math.pow.

wallabra commented 2 years ago

Should we expect this kind of optimization to happen in the default Zig linker backend as well?

And, if not, then should we expect it to be implemented there in the future?

andrewrk commented 2 years ago

Pasting the Zig code here for posterity:

extern fn @"llvm.pow.f32"(f32, f32) f32;
extern fn @"llvm.fabs.f32"(f32) f32;

const vec3 = struct {
    x: f32,
    y: f32,
    z: f32,
};

inline fn minkowski(a: vec3, b: vec3, order: f32) f32 {
    @setFloatMode(.Optimized);
    const accum =
        @"llvm.pow.f32"(@"llvm.fabs.f32"(a.x - b.x), order) +
        @"llvm.pow.f32"(@"llvm.fabs.f32"(a.y - b.y), order) +
        @"llvm.pow.f32"(@"llvm.fabs.f32"(a.z - b.z), order);

    return @"llvm.pow.f32"(accum, 1.0 / order);
}

export fn minkowski1(a: *vec3, b: *vec3) f32 {
    return minkowski(a.*, b.*, 1.0);
}

export fn minkowski2(a: *vec3, b: *vec3) f32 {
    return minkowski(a.*, b.*, 2.0);
}

The LLVM intrinsics via extern function is a bug (#2291) which will be fixed, breaking code like this. The proper fix here is to implement #767 which is already an accepted proposal.

The problem here is that the pow/abs functions are not getting the "fast-math" flag set, which would happen if they were exposed as proper math functions. This is a useful issue to leave open however, because we should make sure that this snippet optimizes properly before closing the issue, and make sure that other math intrinsics also have the fast-math flag properly set.

wallabra commented 2 years ago

Presumably Zig's default backend could also find use cases for, and optimize, libm calls...Food for thought, but a bit long term. :)