Open llvmbot opened 6 years ago
I gave the false to the RangeIsError argument to SemaBuiltinConstantArgRange
a shot by re-compiling clang with this patch:
--- ../SemaChecking.cpp.orig 2020-04-12 21:35:25.294095128 +0200
+++ ./tools/clang/lib/Sema/SemaChecking.cpp 2020-04-12 21:36:29.890658476 +0200
@@ -1703,7 +1703,7 @@
#undef GET_NEON_IMMEDIATE_CHECK
}
- return SemaBuiltinConstantArgRange(TheCall, i, l, u + l);
+ return SemaBuiltinConstantArgRange(TheCall, i, l, u + l, /*RangeIsError*/ false);
}
bool Sema::CheckARMBuiltinExclusiveCall(unsigned BuiltinID, CallExpr *TheCall,
and it seems to have helped. With it I am able to compile the example code by Joesph (after fixing the typo) without any error (or even warnings...).
I am not very familiar with the llvm code though so this might break other things for all I know.
On X86, we pass false to the RangeIsError argument to SemaBuiltinConstantArgRange. This delays the diagnostic being emitted until we know for sure the builtin will realy be emitted. Not sure if that will help this case or not. I think the frontend might still needs some encouragement to constant fold the ternary condition itself rather than delaying to middle end optimizations.
The example has a typo (should be "i > 0" in the conditional), so it would actually choose the out of range variant. But idea is that only one of the shifts is actually needed and the code is trying to use the conditional to get rid of the wrong one.
I'll copy/paste my comments from the thread to keep things in one place:
It's an incompatibility, but it's unclear that it's a bug. Clang intentionally range-checks these intrinsics because they map to instructions that only have a valid form with the permitted (positive) shift amounts.
If we relaxed that requirement then -O0 compilation would still be broken but with a significantly worse error message, which I don't think many Clang devs would consider acceptable (both because of the different behaviour depending on -O* and the message) -- I certainly wouldn't. GCC appears to do some kind of basic optimization that eliminates the invalid instructions even at -O0, but that's not how Clang works and is also unlikely to change in the short-medium term.
I have seen someone propose that intrinsics taking immediates should also accept variables and lower to multiple-instruction equivalents if necessary, but I wasn't terribly keen on that either personally. It would add a reciprocal incompatibility with GCC in at least the short term (possiblty forever if they nope out), and I tend to think most people writing these would prefer to know if their call was inefficient anyway.
Someone keen might be able to get a patch through cfe-commits that allows the user to demote these errors to a warning as a compromise, but I'm afraid otherwise the only suggestion is to rewrite code so that invalid intrinsics are never called statically.
I believe Craig Topper ran into something like this with an X86 intrinsic and was able to work around this problem by only making the error in contexts where it would be actually executed (since these require immediates anyway). A properly motivated individual could perhaps ask Craig how he did it/an example of an implementation.
The example has a typo (should be "i > 0" in the conditional), so it would actually choose the out of range variant. But idea is that only one of the shifts is actually needed and the code is trying to use the conditional to get rid of the wrong one.
I'll copy/paste my comments from the thread to keep things in one place:
It's an incompatibility, but it's unclear that it's a bug. Clang intentionally range-checks these intrinsics because they map to instructions that only have a valid form with the permitted (positive) shift amounts.
If we relaxed that requirement then -O0 compilation would still be broken but with a significantly worse error message, which I don't think many Clang devs would consider acceptable (both because of the different behaviour depending on -O* and the message) -- I certainly wouldn't. GCC appears to do some kind of basic optimization that eliminates the invalid instructions even at -O0, but that's not how Clang works and is also unlikely to change in the short-medium term.
I have seen someone propose that intrinsics taking immediates should also accept variables and lower to multiple-instruction equivalents if necessary, but I wasn't terribly keen on that either personally. It would add a reciprocal incompatibility with GCC in at least the short term (possiblty forever if they nope out), and I tend to think most people writing these would prefer to know if their call was inefficient anyway.
Someone keen might be able to get a patch through cfe-commits that allows the user to demote these errors to a warning as a compromise, but I'm afraid otherwise the only suggestion is to rewrite code so that invalid intrinsics are never called statically.
I had a real quick look, and noticed e.g. for:
vshlq_n_u32 (uint32x4_t a, const int n)
it is required that: 0 << n << 31
and when with
vshlq_n_u32(x, 32 + i)
and
vroti_epi32(x, 12);
it looks like we get 32 + 12, so the error message looks correct to me, or did I miss something?
Is there any additional information needed or required, or that would simply help speed along the joint goal of resolving this compilation problem?
Thanks, -Joe
Extended Description
The following sample program refuses to compile; yet works fine on other compilers.
When compiling, the following output is shown:
I expected that the program compile and execute, successfully.