Ternary define/macro with a dead branch is always evaluated

llvmbot commented 6 years ago


Bugzilla Link	39087
Version	unspecified
OS	Linux
Reporter	LLVM Bugzilla Contributor
CC	@topperc,@erichkeane,@grimler91,@sjoerdmeijer,@TNorthover

Extended Description

The following sample program refuses to compile; yet works fine on other compilers.

#include <arm_neon.h>

#define vroti_epi32(x, i)                                         \
        (i < 0 ? vsliq_n_u32(vshrq_n_u32(x, 32 - i), x, i)        \
               : vsriq_n_u32(vshlq_n_u32(x, 32 + i), x, -i))

int main()
{
    uint32x4_t x = vdupq_n_u32(42);
    uint32x4_t value = vroti_epi32(x, 12);

    return 0;
}

When compiling, the following output is shown:

example.c:10:21: error: argument should be a value from 0 to 31
        uint32x4_t value = vroti_epi32(x, 12);
                           ^
example.c:5:30: note: expanded from macro 'vroti_epi32'
               : vsriq_n_u32(vshlq_n_u32(x, 32 + i), x, -i))
                             ^
/usr/lib/llvm-3.6/bin/../lib/clang/3.6.0/include/arm_neon.h:23019:24: note: expanded from macro 'vshlq_n_u32'
  __ret = (uint32x4_t) __builtin_neon_vshlq_n_v((int8x16_t)__s0, __p1, 50); \
                       ^
/usr/lib/llvm-3.6/bin/../lib/clang/3.6.0/include/arm_neon.h:24704:21: note: expanded from macro 'vsriq_n_u32'
  uint32x4_t __s0 = __p0; \
                    ^
example.c:10:21: error: argument should be a value from 1 to 32
        uint32x4_t value = vroti_epi32(x, 12);
                           ^
example.c:5:18: note: expanded from macro 'vroti_epi32'
               : vsriq_n_u32(vshlq_n_u32(x, 32 + i), x, -i))
                 ^
/usr/lib/llvm-3.6/bin/../lib/clang/3.6.0/include/arm_neon.h:24707:24: note: expanded from macro 'vsriq_n_u32'
  __ret = (uint32x4_t) __builtin_neon_vsriq_n_v((int8x16_t)__s0, (int8x16_t)__s1, __p2, 50); \
                       ^
2 errors generated.

I expected that the program compile and execute, successfully.

fe4e41ac-b776-43cb-af7e-f7db67d81d28 commented 4 years ago

I gave the false to the RangeIsError argument to SemaBuiltinConstantArgRange a shot by re-compiling clang with this patch:

--- ../SemaChecking.cpp.orig    2020-04-12 21:35:25.294095128 +0200
+++ ./tools/clang/lib/Sema/SemaChecking.cpp     2020-04-12 21:36:29.890658476 +0200
@@ -1703,7 +1703,7 @@
   #undef GET_NEON_IMMEDIATE_CHECK
   }

-  return SemaBuiltinConstantArgRange(TheCall, i, l, u + l);
+  return SemaBuiltinConstantArgRange(TheCall, i, l, u + l, /*RangeIsError*/ false);
 }

 bool Sema::CheckARMBuiltinExclusiveCall(unsigned BuiltinID, CallExpr *TheCall,

and it seems to have helped. With it I am able to compile the example code by Joesph (after fixing the typo) without any error (or even warnings...).

I am not very familiar with the llvm code though so this might break other things for all I know.

topperc commented 4 years ago

On X86, we pass false to the RangeIsError argument to SemaBuiltinConstantArgRange. This delays the diagnostic being emitted until we know for sure the builtin will realy be emitted. Not sure if that will help this case or not. I think the frontend might still needs some encouragement to constant fold the ternary condition itself rather than delaying to middle end optimizations.

erichkeane commented 4 years ago

The example has a typo (should be "i > 0" in the conditional), so it would actually choose the out of range variant. But idea is that only one of the shifts is actually needed and the code is trying to use the conditional to get rid of the wrong one.

I'll copy/paste my comments from the thread to keep things in one place:

It's an incompatibility, but it's unclear that it's a bug. Clang intentionally range-checks these intrinsics because they map to instructions that only have a valid form with the permitted (positive) shift amounts.

If we relaxed that requirement then -O0 compilation would still be broken but with a significantly worse error message, which I don't think many Clang devs would consider acceptable (both because of the different behaviour depending on -O* and the message) -- I certainly wouldn't. GCC appears to do some kind of basic optimization that eliminates the invalid instructions even at -O0, but that's not how Clang works and is also unlikely to change in the short-medium term.

I have seen someone propose that intrinsics taking immediates should also accept variables and lower to multiple-instruction equivalents if necessary, but I wasn't terribly keen on that either personally. It would add a reciprocal incompatibility with GCC in at least the short term (possiblty forever if they nope out), and I tend to think most people writing these would prefer to know if their call was inefficient anyway.

Someone keen might be able to get a patch through cfe-commits that allows the user to demote these errors to a warning as a compromise, but I'm afraid otherwise the only suggestion is to rewrite code so that invalid intrinsics are never called statically.

I believe Craig Topper ran into something like this with an X86 intrinsic and was able to work around this problem by only making the error in contexts where it would be actually executed (since these require immediates anyway). A properly motivated individual could perhaps ask Craig how he did it/an example of an implementation.

TNorthover commented 4 years ago

The example has a typo (should be "i > 0" in the conditional), so it would actually choose the out of range variant. But idea is that only one of the shifts is actually needed and the code is trying to use the conditional to get rid of the wrong one.

I'll copy/paste my comments from the thread to keep things in one place:

It's an incompatibility, but it's unclear that it's a bug. Clang intentionally range-checks these intrinsics because they map to instructions that only have a valid form with the permitted (positive) shift amounts.

If we relaxed that requirement then -O0 compilation would still be broken but with a significantly worse error message, which I don't think many Clang devs would consider acceptable (both because of the different behaviour depending on -O* and the message) -- I certainly wouldn't. GCC appears to do some kind of basic optimization that eliminates the invalid instructions even at -O0, but that's not how Clang works and is also unlikely to change in the short-medium term.

I have seen someone propose that intrinsics taking immediates should also accept variables and lower to multiple-instruction equivalents if necessary, but I wasn't terribly keen on that either personally. It would add a reciprocal incompatibility with GCC in at least the short term (possiblty forever if they nope out), and I tend to think most people writing these would prefer to know if their call was inefficient anyway.

Someone keen might be able to get a patch through cfe-commits that allows the user to demote these errors to a warning as a compromise, but I'm afraid otherwise the only suggestion is to rewrite code so that invalid intrinsics are never called statically.

sjoerdmeijer commented 4 years ago

I had a real quick look, and noticed e.g. for:

vshlq_n_u32 (uint32x4_t a, const int n)

it is required that: 0 << n << 31

and when with

vshlq_n_u32(x, 32 + i)

and

vroti_epi32(x, 12);

it looks like we get 32 + 12, so the error message looks correct to me, or did I miss something?

llvmbot commented 6 years ago

Is there any additional information needed or required, or that would simply help speed along the joint goal of resolving this compilation problem?

Thanks, -Joe

llvm / llvm-project

Ternary define/macro with a dead branch is always evaluated #38435

Extended Description