llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.69k stars 11.87k forks source link

[llvm] add libcall optimizations for C23 stdbit.h routines #79630

Open nickdesaulniers opened 9 months ago

nickdesaulniers commented 9 months ago

C23 added a bunch of type generic bitwise functions like:

Refer to section 7.18 of the latest C23 draft (n3096 is what I'm looking at).

All of the above are type generic, so I expect libc implementations to use _Generic to dispatch to the proper variant. There are type specific variants of all of the above with suffixes uc, us, ui, ul, ull for unsigned char, unsigned short, unsigned int, unsigned long, and unsigned long long respectively.

We should recognize calls to all of the above (14 type-generic functions cross 5 specific types == 70 functions) and ensure that calling them with a constant value is recognized (in hosted mode) and folded properly, thereby eliminating the library dispatch.

llvm/lib/Transforms/Utils/SimplifyLibCalls.cpp should recognize these calls in IR and ensure we can elide the libcalls.

cc @efriedma-quic @davidbolvansky

efriedma-quic commented 9 months ago

I expect we'll want to convert all of these to the equivalent LLVM intrinsics? (Not sure if we want to do that conversion in clang CGBuiltin, or in SimplifyLibCalls.)

pinskia commented 9 months ago

It might be decent to follow GCC's lead instead of creating new ways of doing it. GCC 14 added builtins to support these and decided NOT do the explosion of the types but rather have type generic builtins. See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

GCC added the following:

__builtin_stdc_bit_ceil 
__builtin_stdc_bit_floor 
__builtin_stdc_bit_width 
__builtin_stdc_count_ones 
__builtin_stdc_count_zeros 
__builtin_stdc_first_leading_one 
__builtin_stdc_first_leading_zero 
__builtin_stdc_first_trailing_one 
__builtin_stdc_first_trailing_zero 
__builtin_stdc_has_single_bit 
__builtin_stdc_leading_ones 
__builtin_stdc_leading_zeros 
__builtin_stdc_trailing_ones
__builtin_stdc_trailing_zeros 

to support those and also added a few extra ones to suport the original set builtins better (see description on them because some take an optional 2nd argument for the original undefined case of 0):

__builtin_ffsg
__builtin_clzg 
__builtin_ctzg 
__builtin_clrsbg 
__builtin_popcountg 
__builtin_parityg 
pinskia commented 9 months ago

I should also add GCC's builtins work with _BitInt types too and not just the integer types.

efriedma-quic commented 9 months ago

That... doesn't really address the issue here? I mean, sure, we should probably implement the gcc builtins; they allow a simpler implementation in the library headers, which means better diagnostics. But in the end, people can still write code using the explicit suffixes, and we still need to support optimizing such code.

nickdesaulniers commented 9 months ago

I guess I should amend the feature request to say:

"and if clang implements similar builtins to GCC and thus LLVM intrinsic functions, these same libcall optimizations should apply there as well, too."

thanks for the additional insight @pinskia !

I expect we'll want to convert all of these to the equivalent LLVM intrinsics?

Why? We don't do that for most libcalls that SimplifyLibCalls can transform, right? (I don't have an argument for why not, just genuinely curious about the suggestion).

pinskia commented 9 months ago

I should note GCC didn't add any extra libcalls for these functions but turned them into calling the others already. e.g.:

_BitInt(256) f(unsigned _BitInt(256) a)
{
  return __builtin_popcountg(a);
}

is lowered into calling __builtin_popcountl 4 times (and then lowered again to calling __popcountdi2 if the target does not support popcount instruction). Note GCC also some internal functions (not exposed to user) for popcount, ctz, clz which map more directly to the what the target supports and gets expanded when converting from the high level IR to the lowlevel by the backends (via what GCC calls optabs).

efriedma-quic commented 9 months ago

I expect we'll want to convert all of these to the equivalent LLVM intrinsics?

Why? We don't do that for most libcalls that SimplifyLibCalls can transform, right? (I don't have an argument for why not, just genuinely curious about the suggestion).

For calls which don't have a corresponding intrinsic, we don't convert them; we don't introduce intrinsics for everything that has a C function. Implementing intrinsics is a non-trivial amount of code, and converting llvm intrinsics to libc calls is complicated in ways we don't want to deal with if we can avoid it.

In this case, we already have llvm.popcount/llvm.ctlz/llvm.cttz, which optimizations know how to handle, and the backend knows how to lower for every target, so it makes sense to use them.

nickdesaulniers commented 8 months ago

Ah, so clang could lower these builtins to the existing llvm ir intrinsics for ctz/clz/popcount.