Proposal: Add `never_intrinsify` to `std.builtin.CallModifier`

alexrp commented 3 weeks ago

This modifier prevents the compiler from turning a function call into an intrinsic/builtin. In other words, when you call a function using this modifier, you're guaranteed to actually get a call to that function in the generated code (note: inlining is still permitted). Concretely, it would map to the nobuiltin LLVM attribute at the call site, and whatever equivalent exists for other backends.

Motivated by #21831 (and I suspect numerous other cases in compiler-rt if I went digging).

FWIW, Clang has this in the form of the no_builtin attribute, so I think it's important that Zig also be able to express this.

rohlem commented 3 weeks ago

Is this phenomenon intrinsically linked to callsites and/or the function boundary? I imagine the compiler might replace any code section with an equivalent builtin. For the use case of implementing compiler_rt, where the functions map 1:1, a call modifier would be a natural fit, but maybe a nobuiltin/nointrinsify block (similar to nosuspend blocks) would be more flexible / useful in general?

What also looks strange to me is that the linked no_builtin attribute is declared for the callee function, while a CallModifier is currently supplied at the caller's @call. EDIT: Ah, so LLVM puts it on the caller but clang uses a callee attribute? That's weird, but maybe normal for LLVM's inconsistencies. It would still be worth picking the more sensible choice for Zig though. For builtins only called via the compiler it might not matter, but it seems strange to me to generate a single function twice - once for never_intrinsify callers and once for callers that allow builtins.

Rexicon226 commented 3 weeks ago

For the use case of implementing compiler_rt, where the functions map 1:1, a call modifier would be a natural fit, but maybe a nobuiltin/nointrinsify block (similar to nosuspend blocks) would be more flexible / useful in general?

Not in any order, but here would be my counter points:

Just looking at the technical aspect, how would you even implement something like this?
never_intrinsify is necessary at specific callsites, so I don't think it makes much sense for it to be inside of the callee.
never_intrinsify would be quite rarely used, so making it a "language semantic" seems superfluous and unnecessary to me.

rohlem commented 3 weeks ago

never_intrinsify is necessary at specific callsites, so I don't think it makes much sense for it to be inside of the callee.

In my understanding (which may be wrong), f.e. in the use case of our own memcpy implementation, that is a function (callee) we provide in compiler_rt, which we never want to be translated as a call to a builtin (no matter the caller/callsite).

The fact that the compiler is the one emitting the calls (generating callsites) makes it feasible to always specify this as a callsite-attribute, but I don't see the point of ever allowing a call to a compiler_rt function to be replaced by a compiler intrinsic - in that case I'd expect callers use builtins like @memcpy etc. .

never_intrinsify would be quite rarely used, so making it a "language semantic" seems superfluous and unnecessary to me.

Probably true, afaiu hand-written assembly already isn't affected by these builtin-replacements? It might still be useful for some micro-optimization use cases, but those probably shouldn't be prioritized.

alexrp commented 3 weeks ago

Is this phenomenon intrinsically linked to callsites and/or the function boundary? I imagine the compiler might replace any code section with an equivalent builtin. For the use case of implementing compiler_rt, where the functions map 1:1, a call modifier would be a natural fit, but maybe a nobuiltin/nointrinsify block (similar to nosuspend blocks) would be more flexible / useful in general?

That may be true, but:

I don't currently have a motivating use case for such a broad feature.
It's hard to justify the added language complexity for such a niche problem.
The feature can always be broadened later if needed.

I think a call modifier strikes a good balance with regards to complexity and usefulness.

What also looks strange to me is that the linked no_builtin attribute is declared for the callee function, while a CallModifier is currently supplied at the caller's @call.

I don't quite follow here. The Clang attribute is put on a function, and any code or calls within that function won't be transformed to builtins. It's like if you'd manually written every call in the function as @call(.never_intrinsify, ...) in Zig.

alexrp commented 3 weeks ago

Updated description to note that this modifier would still permit inlining at the compiler's discretion like auto.

dweiller commented 2 weeks ago

Is this phenomenon intrinsically linked to callsites and/or the function boundary? I imagine the compiler might replace any code section with an equivalent builtin. For the use case of implementing compiler_rt, where the functions map 1:1, a call modifier would be a natural fit, but maybe a nobuiltin/nointrinsify block (similar to nosuspend blocks) would be more flexible / useful in general?

That may be true, but:
1. I don't currently have a motivating use case for such a broad feature.

I don't quite understand how a call modifier for use with @call would help in the linked issue, I may just be misunderstanding that issue. I assumed the linked issue is something like I encountered (detailed below) but I don't know what the zig source for that issue is, so I'm not sure.

One place where being able to mark a function/block with nobuiltin would have been nice to have for me is in implementing things like memcpy in compiler-rt. At the moment, you need to be careful to not have llvm codegen recursive calls to memcpy - basically you have to make sure llvm doesn't recognise code paths (not just function calls) as something it thinks it can replace with a call to memcpy. Sometimes utility functions needed to be made noinline to beat llvm's recognition, and in those cases a nointrinsify call modifier might have been preferable, but I also needed rewrite some simple copying loops as well, as they could be turned into memcpy calls in some cases. Without being able to mark a loop (or the function containing it) with a nobuiltin annotation, this means the code might break whenever llvm changes the way their memcpy detection changes, and can also make some things look more complicated than needed for no apparent reason.

alexrp commented 2 weeks ago

It would help because this is the implementation of the problematic function(s):

https://github.com/ziglang/zig/blob/a916bc7fdd3975a9e2ef13c44f814c71ce017193/lib/compiler_rt/arm.zig#L60-L63

The issue is that LLVM is recognizing the call to memcpy by name and turning it into an intrinsic call instead, which then gets turned into a (recursive) call to __aeabi_memcpy in the target backend.

Putting nobuiltin on the call site will prevent LLVM from doing this.

alexrp commented 2 weeks ago

With regards to the compiler-rt problem you're having, I'm afraid you're a victim of this code:

https://github.com/ziglang/zig/blob/a916bc7fdd3975a9e2ef13c44f814c71ce017193/src/codegen/llvm.zig#L3210-L3217

The attribute needed to get the desired effect here is no-builtins, not nobuiltin. I mean, obviously!

ParfenovIgor commented 2 weeks ago

Does the no-builtins exist? I can't find anything reasonable about it in llvm repo. https://github.com/llvm/llvm-project/blob/11df0ce1405ec3e3721b43764dc53250aa9e08a1/llvm/include/llvm/IR/Attributes.h#L86 https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/Attributes.td

alexrp commented 2 weeks ago

https://github.com/llvm/llvm-project/blob/ffe04e0351203524b212f850b48edf54dc5dbeb5/llvm/include/llvm/Analysis/TargetLibraryInfo.h#L290-L311

LLVM doesn't have a fixed set of predefined attributes.

ziglang / zig

Proposal: Add `never_intrinsify` to `std.builtin.CallModifier` #21833