Open llvmbot opened 4 years ago
Actually, it would make a lot of sense for C++ to have this option, but I don't believe it exists. Would be great to have it anyway.
This already exists; it's called a non-type template parameter.
gcc probably isn't a great example to follow. It doesn't actually check the parameters to NEON builtins in the frontend at all. It delays emitting the diagnostic messages until somewhere in the backend, which has other nasty side-effects. (In particular, whether an error is emitted depends on the optimization level, at least in some cases.)
A much simpler way to fix this shadowing of parameters issue would be to prefix all temporary variables with the name of the macro, like
That doesn't fix the issue; it only makes it harder to hit. Consider, for example, a call to vcopyq_laneq_f32
where one of the arguments is itself a call to vcopyq_laneq_f32
. But sure, that would be relatively simple to implement.
A much simpler way to fix this shadowing of parameters issue would be to prefix all temporary variables with the name of the macro, like
__extension__ ({ float32_t __s0 = __extension__ ({ float32x4_t __s0 =
would become
__extension__ ({ float32_t __vcopy_lane_s0 = __extension__ ({ float32x4_t __vset_lane_s0 =
which I believe would be a much smaller change since I'm quite sure neon_arm.h is generated and would not require too much code change. But you are the experts
There isn't really any reason it has to be a macro specifically, though; we could teach the compiler to give special treatment to calls to vcopyq_laneq_f32, instead of using a macro to expand it out to a call to __builtin_neon_vsetq_lane_f32 and giving that call special treatment.
I was about to suggest that as well. If we teach the compiler to bypass warnings and errors, why not teach it to handle this case (and make sure the lane number is a constant). BTW, looks like gcc is able to do this:
__extension__ extern __inline float32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vcopyq_lane_f32 (float32x4_t __a, const int __lane1,
float32x2_t __b, const int __lane2)
{
return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
__a, __lane1);
}
I understand this would make it a bigger change than anticipated, but wonder if something like a constexpr
parameter extension would be a more "global" solution to this. Actually, it would make a lot of sense for C++ to have this option, but I don't believe it exists. Would be great to have it anyway.
A bit like __builtin_constant_p
but instead of just evaluating if the argument is a constant, make it a constraint on the parameter.
Some googling shows recent versions of ccache do this.
Ignore this; the situation appears to be more complicated. Not sure what the current state is.
clang has code to detect code that comes from system headers, and suppresses warnings. This can include macros... but of course we lose the original source location for macros after generating preprocessed source with -E
. If you control your own workflow, you can probably avoid this using -frewrite-includes
. Some googling shows recent versions of ccache
do this.
The reason arm_neon.h has a bunch of macros has to do with restrictions the operands to certain instructions: certain operands are required to be integer constants in a specific range. We can't properly enforce that in a normal function; it has to be a direct call to a builtin with appropriate handling.
There isn't really any reason it has to be a macro specifically, though; we could teach the compiler to give special treatment to calls to vcopyq_laneq_f32
, instead of using a macro to expand it out to a call to __builtin_neon_vsetq_lane_f32
and giving that call special treatment.
Extended Description
Here the code:
Notice that code:
Should report a shadowing issue, but does fails to do so:
does not report any issue at all. But this code here (as the result of the
clang -E > t1.c
and reedited for clarity)[Note: though I use clang 6.0, this can be reproduced with trunk clang]
Now. multiple questions obviously come to mind: 1- By what magic does clang remove the error in the first compilation (the one of vcopy.c), but fails in the second case. Are those shadow errors inconsistent or just deliberately trapped for some pattern and silently ignored?
2- Why so many neon intrinsics not functions? Does the compiler fails at inlining them?
The reason for filing this bug is that I believe
ccache
(heavily used in the context of building chromium) reuses the preprocessed C/C++ code and reinjects them into the compilation process. And it fails whenvcopy
is used in the code.There is an obvious workaround though (but only for clang), so make it a low priority.