Open nickdesaulniers opened 9 months ago
I expect we'll want to convert all of these to the equivalent LLVM intrinsics? (Not sure if we want to do that conversion in clang CGBuiltin, or in SimplifyLibCalls.)
It might be decent to follow GCC's lead instead of creating new ways of doing it. GCC 14 added builtins to support these and decided NOT do the explosion of the types but rather have type generic builtins. See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
GCC added the following:
__builtin_stdc_bit_ceil
__builtin_stdc_bit_floor
__builtin_stdc_bit_width
__builtin_stdc_count_ones
__builtin_stdc_count_zeros
__builtin_stdc_first_leading_one
__builtin_stdc_first_leading_zero
__builtin_stdc_first_trailing_one
__builtin_stdc_first_trailing_zero
__builtin_stdc_has_single_bit
__builtin_stdc_leading_ones
__builtin_stdc_leading_zeros
__builtin_stdc_trailing_ones
__builtin_stdc_trailing_zeros
to support those and also added a few extra ones to suport the original set builtins better (see description on them because some take an optional 2nd argument for the original undefined case of 0):
__builtin_ffsg
__builtin_clzg
__builtin_ctzg
__builtin_clrsbg
__builtin_popcountg
__builtin_parityg
I should also add GCC's builtins work with _BitInt types too and not just the integer types.
That... doesn't really address the issue here? I mean, sure, we should probably implement the gcc builtins; they allow a simpler implementation in the library headers, which means better diagnostics. But in the end, people can still write code using the explicit suffixes, and we still need to support optimizing such code.
I guess I should amend the feature request to say:
"and if clang implements similar builtins to GCC and thus LLVM intrinsic functions, these same libcall optimizations should apply there as well, too."
thanks for the additional insight @pinskia !
I expect we'll want to convert all of these to the equivalent LLVM intrinsics?
Why? We don't do that for most libcalls that SimplifyLibCalls can transform, right? (I don't have an argument for why not, just genuinely curious about the suggestion).
I should note GCC didn't add any extra libcalls for these functions but turned them into calling the others already. e.g.:
_BitInt(256) f(unsigned _BitInt(256) a)
{
return __builtin_popcountg(a);
}
is lowered into calling __builtin_popcountl
4 times (and then lowered again to calling __popcountdi2 if the target does not support popcount instruction).
Note GCC also some internal functions (not exposed to user) for popcount, ctz, clz which map more directly to the what the target supports and gets expanded when converting from the high level IR to the lowlevel by the backends (via what GCC calls optabs).
I expect we'll want to convert all of these to the equivalent LLVM intrinsics?
Why? We don't do that for most libcalls that SimplifyLibCalls can transform, right? (I don't have an argument for why not, just genuinely curious about the suggestion).
For calls which don't have a corresponding intrinsic, we don't convert them; we don't introduce intrinsics for everything that has a C function. Implementing intrinsics is a non-trivial amount of code, and converting llvm intrinsics to libc calls is complicated in ways we don't want to deal with if we can avoid it.
In this case, we already have llvm.popcount/llvm.ctlz/llvm.cttz, which optimizations know how to handle, and the backend knows how to lower for every target, so it makes sense to use them.
Ah, so clang could lower these builtins to the existing llvm ir intrinsics for ctz/clz/popcount.
C23 added a bunch of type generic bitwise functions like:
stdc_leading_zeros
stdc_leading_ones
stdc_trailing_zeros
stdc_trailing_ones
stdc_first_leading_zero
stdc_first_leading_one
(smells likeffs
but using proper signedness)stdc_first_trailing_zero
stdc_first_trailing_one
stdc_count_zeros
stdc_count_ones
stdc_has_single_bit
stdc_bit_width
stdc_bit_floor
stdc_bit_ceil
Refer to section 7.18 of the latest C23 draft (n3096 is what I'm looking at).
All of the above are type generic, so I expect libc implementations to use
_Generic
to dispatch to the proper variant. There are type specific variants of all of the above with suffixesuc
,us
,ui
,ul
,ull
forunsigned char
,unsigned short
,unsigned int
,unsigned long
, andunsigned long long
respectively.We should recognize calls to all of the above (14 type-generic functions cross 5 specific types == 70 functions) and ensure that calling them with a constant value is recognized (in hosted mode) and folded properly, thereby eliminating the library dispatch.
llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp should recognize these calls in IR and ensure we can elide the libcalls.
cc @efriedma-quic @davidbolvansky