Open wallabra opened 2 years ago
Should we expect this kind of optimization to happen in the default Zig linker backend as well?
And, if not, then should we expect it to be implemented there in the future?
Pasting the Zig code here for posterity:
extern fn @"llvm.pow.f32"(f32, f32) f32;
extern fn @"llvm.fabs.f32"(f32) f32;
const vec3 = struct {
x: f32,
y: f32,
z: f32,
};
inline fn minkowski(a: vec3, b: vec3, order: f32) f32 {
@setFloatMode(.Optimized);
const accum =
@"llvm.pow.f32"(@"llvm.fabs.f32"(a.x - b.x), order) +
@"llvm.pow.f32"(@"llvm.fabs.f32"(a.y - b.y), order) +
@"llvm.pow.f32"(@"llvm.fabs.f32"(a.z - b.z), order);
return @"llvm.pow.f32"(accum, 1.0 / order);
}
export fn minkowski1(a: *vec3, b: *vec3) f32 {
return minkowski(a.*, b.*, 1.0);
}
export fn minkowski2(a: *vec3, b: *vec3) f32 {
return minkowski(a.*, b.*, 2.0);
}
The LLVM intrinsics via extern function is a bug (#2291) which will be fixed, breaking code like this. The proper fix here is to implement #767 which is already an accepted proposal.
The problem here is that the pow/abs functions are not getting the "fast-math" flag set, which would happen if they were exposed as proper math functions. This is a useful issue to leave open however, because we should make sure that this snippet optimizes properly before closing the issue, and make sure that other math intrinsics also have the fast-math flag properly set.
Presumably Zig's default backend could also find use cases for, and optimize, libm
calls...Food for thought, but a bit long term. :)
Zig Version
0.9.0 (Godbolt)
Steps to Reproduce
This behaviour can be verified e.g. by implementing the generic Minkowski distance and compiling the instances with order 1 (equivalent to Manhattan distance) and order 2 (equivalent to Euclidean distance). Make sure to enable optimizations with fast math.
It can be done in Godbolt:
-O ReleaseFast
)-O3 -ffast-math
)--release
arg)In this case, compare the deassembled outputs.
Expected Behavior
std.math.powf
should be inlined and optimized out where applicable (i.e. where the compiler can guarantee safe behaviour, and thus simply use the equivalent oflibm
'spowf
).The first- and second-order instances of the generic Minkowski function should be automatically optimized into specific algorithms by the compiler. This can be most easily verified by checking that the assembly versions of both cases are short and don't contain
call
, and that only the 2nd-order case containsvsqrtss
(since in the first-order case, allpowf
calls are passed an exponent of 1 and can be safely optimized out).Actual Behavior
In Zig 0.9.0, the Minkowski function is neither inlined, much less optimized, in the specific cases of orders 1 and 2, as a result of the compiler failing to inline and optimize out
std.math.pow
.