llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.65k stars 11.84k forks source link

Integrated arm assembler doesn't understand vuzpq.u16 #20797

Closed nico closed 10 years ago

nico commented 10 years ago
Bugzilla Link 20423
Resolution WONTFIX
Resolved on Aug 08, 2014 17:05
Version trunk
OS All
Blocks llvm/llvm-project#20796
CC @compnerd,@rengolin

Extended Description

This works fine with gcc:

thakis@ubu:~$ arm-linux-gnueabihf-g++ -c test.cc -mfpu=neon thakis@ubu:~$ cat test.cc void foo() { asm volatile ("vuzpq.u16 q0, q1\n\t" : : :); }

But clang says:

thakis$ ~/src/llvm-build/bin/clang -target arm-linux-androideabi -c -mfpu=neon test.cc

:1:2: error: invalid instruction vuzpq.u16 q0, q1 ^ 1 error generated. (this is used in skia)
rengolin commented 10 years ago

Nope. They're both lowered by the same piece of code and the only resulting difference is the register they operate on, which vzip/vuzp already support.

Having them in arm_neon.h is distinct from having them as an assembly alias. If I add it to the assembler, the only thing I'd do is to change vzipq back to vzip and vuzpq to vuzp.

nico commented 10 years ago

And one implementation ending in 8hi and the other in 4hi doesn't make a difference?

rengolin commented 10 years ago

Well, intrinsics are not always directly tied to instructions, and that's the point of having them in the first place, so we can abstract things. Both intrinsics will map to the same instructions (with different registers).

Let's change this on the source and mark is as won't fix for now. If we find another bug in a less friendly source, we re-open this and add the aliases. :)

Thanks!

nico commented 10 years ago

As far as I can tell, this is only used in a single place in skia assembly, so we could change that.

However, gcc's arm_neon.h also has a vuzpq_u16() intrinsic that's used in two places. Its implementation looks different from the vuzp_u16() intrinsic implementation -- are you sure that they're the same instruction?

(vuzpq_u16() calls __builtin_neon_vuzpv8hi(), while vuzp_u16() calls __builtin_neon_vuzpv4hi(). The former takes two and returns an uint16x8x2_t, the later uses uint16x4x2_t instead.)

It seems like it's probably a good idea to support the instructions that have intrinsics in arm_neon.h?

rengolin commented 10 years ago

Hi Saleem,

This falls into the category of being ridiculously simple and clean that it may actually be worth it.

But I also don't know how widespread that is, so it's hard to measure its usefulness, given that we don't want to encourage usage outside of the ARM ARM.

I have a patch ready, but I'm not sure I should apply.

Nico, could this one be changed in the source?

cheers, --renato

compnerd commented 10 years ago

As a temporary workaround, you can use the real assembly instruction that this is supposed to be: vuzp.u16

This is a GAS extension AFAIK that basically is meant to help prevent accidental register swaps: vuzp takes either dx or qx while vuzpq only accepts qx.

Im torn on this. This is unlikely to be common and is really an extension, but can be somewhat useful for users.

nico commented 10 years ago

assigned to @rengolin