llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

[Aarch64] `Wshadow` and vcopy intrinsic #43736

Open llvmbot opened 4 years ago

llvmbot commented 4 years ago
Bugzilla Link 44391
Version trunk
OS Linux
Reporter LLVM Bugzilla Contributor
CC @DougGregor,@efriedma-quic,@kbeyls,@zygoloid

Extended Description

Here the code:

/// vcopy.c 
#include <arm_neon.h>

float32x4_t insert00(float32x4_t v1, float32x4_t v2) {
   return vcopyq_laneq_f32(v1, 0, v2, 0);
}
$ clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

$ clang --target=aarch64-linux-gnu vcopy.c -c -I /usr/aarch64-linux-gnu/include/ -E

# 2 "vcopy.c" 2

float32x4_t insert00(float32x4_t v1, float32x4_t v2) {
   return __extension__ ({ float32x4_t __s0_70 = v1; float32x4_t __s2_70 = v2; float32x4_t __ret_70; __ret_70 = __extension__ ({ float32_t __s0 = __extension__ ({ float32x4_t __s0 = __s2_70; float32_t __ret; __ret = (float32_t) __builtin_neon_vgetq_lane_f32((int8x16_t)__s0, 0); __ret; }); float32x4_t __s1 = __s0_70; float32x4_t __ret; __ret = (float32x4_t) __builtin_neon_vsetq_lane_f32(__s0, (int8x16_t)__s1, 0); __ret; }); __ret_70; });

Notice that code:

{ float32_t __s0 = __extension__ ({ float32x4_t __s0 = __s2_70; .....

Should report a shadowing issue, but does fails to do so:

$ clang --target=aarch64-linux-gnu vcopy.c -c -I /usr/aarch64-linux-gnu/include/ -Werror -Wshadow

does not report any issue at all. But this code here (as the result of the clang -E > t1.c and reedited for clarity)

//// t1.c
#include <arm_neon.h>

float32x4_t insert00(float32x4_t v1, float32x4_t v2) {
   return __extension__ ({ float32x4_t __s0_70 = v1; float32x4_t __s2_70 = v2; float32x4_t __ret_70; __ret_70 = __extension__ ({ float32_t __s0 = __extension__ ({ float32x4_t __s0 = __s2_70; float32_t __ret; __ret = (float32_t) __builtin_neon_vgetq_lane_f32((int8x16_t)__s0, 0); __ret; }); float32x4_t __s1 = __s0_70; float32x4_t __ret; __ret = (float32x4_t) __builtin_neon_vsetq_lane_f32(__s0, (int8x16_t)__s1, 0); __ret; }); __ret_70; });
}
$ clang --target=aarch64-linux-gnu t1.c -c -I /usr/aarch64-linux-gnu/include/ -Werror -Wshadow 
t1.c:4:176: error: declaration shadows a local variable [-Werror,-Wshadow]
  ...({ float32_t __s0 = __extension__ ({ float32x4_t __s0 = __s2_70; float32...
                                                      ^
t1.c:4:140: note: previous declaration is here
  ...float32x4_t __ret_70; __ret_70 = __extension__ ({ float32_t __s0 = __ext...
                                                                 ^
1 error generated.

[Note: though I use clang 6.0, this can be reproduced with trunk clang]

Now. multiple questions obviously come to mind: 1- By what magic does clang remove the error in the first compilation (the one of vcopy.c), but fails in the second case. Are those shadow errors inconsistent or just deliberately trapped for some pattern and silently ignored?

2- Why so many neon intrinsics not functions? Does the compiler fails at inlining them?

The reason for filing this bug is that I believe ccache (heavily used in the context of building chromium) reuses the preprocessed C/C++ code and reinjects them into the compilation process. And it fails when vcopy is used in the code.

There is an obvious workaround though (but only for clang), so make it a low priority.

efriedma-quic commented 4 years ago

Actually, it would make a lot of sense for C++ to have this option, but I don't believe it exists. Would be great to have it anyway.

This already exists; it's called a non-type template parameter.

gcc probably isn't a great example to follow. It doesn't actually check the parameters to NEON builtins in the frontend at all. It delays emitting the diagnostic messages until somewhere in the backend, which has other nasty side-effects. (In particular, whether an error is emitted depends on the optimization level, at least in some cases.)

A much simpler way to fix this shadowing of parameters issue would be to prefix all temporary variables with the name of the macro, like

That doesn't fix the issue; it only makes it harder to hit. Consider, for example, a call to vcopyq_laneq_f32 where one of the arguments is itself a call to vcopyq_laneq_f32. But sure, that would be relatively simple to implement.

llvmbot commented 4 years ago

A much simpler way to fix this shadowing of parameters issue would be to prefix all temporary variables with the name of the macro, like

__extension__ ({ float32_t __s0 = __extension__ ({ float32x4_t __s0 =

would become

__extension__ ({ float32_t __vcopy_lane_s0 = __extension__ ({ float32x4_t __vset_lane_s0 =

which I believe would be a much smaller change since I'm quite sure neon_arm.h is generated and would not require too much code change. But you are the experts

llvmbot commented 4 years ago

There isn't really any reason it has to be a macro specifically, though; we could teach the compiler to give special treatment to calls to vcopyq_laneq_f32, instead of using a macro to expand it out to a call to __builtin_neon_vsetq_lane_f32 and giving that call special treatment.

I was about to suggest that as well. If we teach the compiler to bypass warnings and errors, why not teach it to handle this case (and make sure the lane number is a constant). BTW, looks like gcc is able to do this:

__extension__ extern __inline float32x4_t
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
vcopyq_lane_f32 (float32x4_t __a, const int __lane1,
                 float32x2_t __b, const int __lane2)
{
  return __aarch64_vset_lane_any (__aarch64_vget_lane_any (__b, __lane2),
                                   __a, __lane1);
}

I understand this would make it a bigger change than anticipated, but wonder if something like a constexpr parameter extension would be a more "global" solution to this. Actually, it would make a lot of sense for C++ to have this option, but I don't believe it exists. Would be great to have it anyway. A bit like __builtin_constant_p but instead of just evaluating if the argument is a constant, make it a constraint on the parameter.

efriedma-quic commented 4 years ago

Some googling shows recent versions of ccache do this.

Ignore this; the situation appears to be more complicated. Not sure what the current state is.

efriedma-quic commented 4 years ago

clang has code to detect code that comes from system headers, and suppresses warnings. This can include macros... but of course we lose the original source location for macros after generating preprocessed source with -E. If you control your own workflow, you can probably avoid this using -frewrite-includes. Some googling shows recent versions of ccache do this.

The reason arm_neon.h has a bunch of macros has to do with restrictions the operands to certain instructions: certain operands are required to be integer constants in a specific range. We can't properly enforce that in a normal function; it has to be a direct call to a builtin with appropriate handling.

There isn't really any reason it has to be a macro specifically, though; we could teach the compiler to give special treatment to calls to vcopyq_laneq_f32, instead of using a macro to expand it out to a call to __builtin_neon_vsetq_lane_f32 and giving that call special treatment.