guzba / nimsimd

Pleasant Nim bindings for SIMD instruction sets.
MIT License
76 stars 7 forks source link

Two more glitches in avx.nim #22

Open Asc2011 opened 9 months ago

Asc2011 commented 9 months ago

Hello guzba,

in avx.nim lines 226-230 ::

226 func mm_permutevar_pd(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}` ... 230 func mm_permutevar_pd(a: M128d, b: M128i): M256d {.importc: "_mm_permutevar_pd".}

AFAIK should return the same vector-types that they received. These should be double/single-precision vectors of 128b..

func mm_permutevar_pd*(a: M128d, b: M128i): M128d {.importc: "_mm_permutevar_pd".} ... func mm_permutevar_ps*(a: M128, b: M128i): M128 {.importc: "_mm_permutevar_ps".}

and if you don't mind - sure you will :) - one could fix the terrible Intel-naming just a bit by adding the missing ::

func mm_permutevar_epi32*(a, mask :M128i ): M128i = 
  mm_castsi128_ps(
    mm_permutevar_ps( mm_castps_si128( a ), mask )
  )
func mm_permutevar_epi64*(a, mask :M128i ): M128i =
  mm_castsi128_pd(
    mm_permutevar_pd( mm_castpd_si128( a ), mask )
  )

Since everybd. has to add them anyways - after one got trapped... Same could/should be done for the 256bit-sized vectors. And don't get me wrong here - i just suggest to add what Intel has left out, but evbd. expects to find. But staying with the Intel-wording. Actually a permutevar_<type> is a permute-operation - well, many operation permute a vector. In this case it is a shuffle-operation.. Maybe one could add a common_avx.nim that adds those missing functions to make the intrinsics a bit more consistent ?

just my 20ct, greets Andreas

we are gettin' closer to nimsimd v2 :)

Asc2011 commented 4 months ago

My pull-request-25 fixes the return vectors in avx.nim 226-230. If you want to keep it minimal, then leave it as it is now. Otherwise you might want to include as proposed above :

func mm_permutevar_epi32(a, mask :M128i ): M128i func mm_permutevar_epi64(a, mask :M128i ): M128i

-- Intrinsics-Guide references

guzba commented 4 months ago

Hey, thanks for updating the PR. I was looking again and noticed that the storebe_ signatures have a return value instead of a second parameter. Such as https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=storebe_i16

As for fixing the other issues, I'm fine with getting all the corrections in this PR. It is easy to make little mistakes with this so I am happy to fix them as we notice them.

Asc2011 commented 4 months ago

yeah, another obvious bug - squashed it :) but still untested...

As for fixing the other issues, I'm fine with getting all the corrections in this PR. It is easy to make little mistakes with this so I am happy to fix them as we notice them.

i'm fine with that - lets do this properly. nimsimd is IMHO the most important module of all and it should go into std/simd or whatever name floats-your-boat. I've added a warning to my MultiStepLRU - it's not on nimble yet - so i don't think anybody will notice, anyways ...

greets, andreas