Closed nico closed 10 years ago
Nope. They're both lowered by the same piece of code and the only resulting difference is the register they operate on, which vzip/vuzp already support.
Having them in arm_neon.h is distinct from having them as an assembly alias. If I add it to the assembler, the only thing I'd do is to change vzipq back to vzip and vuzpq to vuzp.
And one implementation ending in 8hi and the other in 4hi doesn't make a difference?
Well, intrinsics are not always directly tied to instructions, and that's the point of having them in the first place, so we can abstract things. Both intrinsics will map to the same instructions (with different registers).
Let's change this on the source and mark is as won't fix for now. If we find another bug in a less friendly source, we re-open this and add the aliases. :)
Thanks!
As far as I can tell, this is only used in a single place in skia assembly, so we could change that.
However, gcc's arm_neon.h also has a vuzpq_u16() intrinsic that's used in two places. Its implementation looks different from the vuzp_u16() intrinsic implementation -- are you sure that they're the same instruction?
(vuzpq_u16() calls __builtin_neon_vuzpv8hi(), while vuzp_u16() calls __builtin_neon_vuzpv4hi(). The former takes two and returns an uint16x8x2_t, the later uses uint16x4x2_t instead.)
It seems like it's probably a good idea to support the instructions that have intrinsics in arm_neon.h?
Hi Saleem,
This falls into the category of being ridiculously simple and clean that it may actually be worth it.
But I also don't know how widespread that is, so it's hard to measure its usefulness, given that we don't want to encourage usage outside of the ARM ARM.
I have a patch ready, but I'm not sure I should apply.
Nico, could this one be changed in the source?
cheers, --renato
As a temporary workaround, you can use the real assembly instruction that this is supposed to be: vuzp.u16
This is a GAS extension AFAIK that basically is meant to help prevent accidental register swaps: vuzp takes either dx or qx while vuzpq only accepts qx.
Im torn on this. This is unlikely to be common and is really an extension, but can be somewhat useful for users.
assigned to @rengolin
Extended Description
This works fine with gcc:
thakis@ubu:~$ arm-linux-gnueabihf-g++ -c test.cc -mfpu=neon thakis@ubu:~$ cat test.cc void foo() { asm volatile ("vuzpq.u16 q0, q1\n\t" : : :); }
But clang says:
thakis$ ~/src/llvm-build/bin/clang -target arm-linux-androideabi -c -mfpu=neon test.cc