jratcliff63367 / sse2neon

Automatically exported from code.google.com/p/sse2neon
284 stars 130 forks source link

How to convert _mm_madd_epi16 to neon instructions #5

Closed zhangfrank closed 4 years ago

zhangfrank commented 8 years ago

Hi, Thank you for your work to convert SSE instructions to NEON! Do you know how to convert _mm_madd_epi16 to neon instructions? Many thanks!

Best Regards,

Frank

jratcliff63367 commented 7 years ago

No, that one hasn't been implemented yet. I'll add it to my todo list.

zhangfrank commented 7 years ago

Good, thank you! Frank

From: John W. Ratcliff [mailto:notifications@github.com] Sent: 2017年3月22日 1:22 To: jratcliff63367/sse2neon sse2neon@noreply.github.com Cc: b49159@freescale.com; Author author@noreply.github.com Subject: Re: [jratcliff63367/sse2neon] How to convert _mm_madd_epi16 to neon instructions (#5)

No, that one hasn't been implemented yet. I'll add it to my todo list.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jratcliff63367/sse2neon/issues/5#issuecomment-288153457, or mute the threadhttps://github.com/notifications/unsubscribe-auth/APRUaB9Hfh5YNw03mHS2KNqBYqoEMtqFks5roAc5gaJpZM4Go_uX.

nemequ commented 7 years ago
  int32x4_t pl = vmull_s16(vget_low_s16(a),  vget_low_s16(b));
  int32x4_t ph = vmull_s16(vget_high_s16(a), vget_high_s16(b));
  int32x2_t rl = vpadd_s32(vget_low_s32(pl), vget_high_s32(pl));
  int32x2_t rh = vpadd_s32(vget_low_s32(ph), vget_high_s32(ph));
  int32x4_t result = vcombine_s32(rl, rh);

AArch64 would make things a bit easier, I think.

jserv commented 4 years ago

Implemented in DLTcollab/sse2neon.