Add vqtbl?q_?8 intrinsics

rosbif commented 4 years ago

Add vqtbl?q_?8 intrinsics. This fixes issue #33 "The vqtbl* intrinsics are missing". This replaces pull request #36.

Edit: I subsequently committed an improved algorithm for the vqtbl2q, vqtbl3q and vqtbl4q intrinsics which is faster, particularly with SSE4.

Zvictoria commented 4 years ago

rosbif, Thanks a lot for your contribution. As for the vqtblx algorithm improvement - it looks legit, I will accept it for sure (just give me some time please). As for the vqtbl?q_ functions added - it is not an easy question. These functions belong to A64 not to the original ARM NEON set. And it means while they are useful I don't have any tests for them and even if I get them I need to specify somehow their A64 nature... Need to think it over. Thanks again.

rosbif commented 4 years ago

Hi Victoria,

Le 14/01/2020 à 16:52, Victoria a écrit :

rosbif, Thanks a lot for your contribution. As for the vqtblx algorithm improvement - it looks legit, I will accept it for sure (just give me some time please). As for the vqtbl?q_ functions added - it is not an easy question. These functions belong to A64 not to the original ARM NEON set. And it means while they are useful I don't have any tests for them and even if I get them I need to specify somehow their A64 nature... Need to think it over. Thanks again.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/intel/ARM_NEON_2_x86_SSE/pull/37?email_source=notifications&email_token=AAEHHKY7WNUVEZ42H7YPIZDQ5XNUTA5CNFSM4KCU6XB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5DTXY#issuecomment-574241247, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEHHK57HB3R5BDPAUWICJTQ5XNUTANCNFSM4KCU6XBQ.

I must admit that I am new to GitHub (which was probably apparent as I was a bit clumsy) and also new to NEON.

I originally wrote my code for SSE and AVX2. I was a bit bored over the holidays so I thought that it would be amusing to try AVX-512 and NEON versions. I found your excellent work which enabled me to test equivalent NEON instructions on my x86_64 hardware. The only missing instruction I needed was vqtbl1q_u8 (to replace_mm_shuffle_epi8) so I added it. Subsequently I added the others to complete the set.

I was amazed that with your superb NEON2SSE work I obtained nearly the same performance with simulated NEON as with native SSE.

Thank you for your great work.

Cheers, Chris

rosbif commented 4 years ago

I am closing this because, looking at it again, I think it is buggy. Sorry :-(

intel / ARM_NEON_2_x86_SSE

Add vqtbl?q_?8 intrinsics #37