guzba / nimsimd

Pleasant Nim bindings for SIMD instruction sets.
MIT License
73 stars 7 forks source link

Aligned alloc _mm_malloc/free() would be nice to have #23

Open Asc2011 opened 7 months ago

Asc2011 commented 7 months ago

Hi again,

_mm_alloc() and _mm_free() should be in sse.nim which is missing. I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.

What do you think ?

regards, Andreas

-- i found the include for prefetch in sse2.nim and added the desired for testing :

func mm_malloc*( size: int, align: int) :pointer {.importc: "_mm_malloc".}
func mm_free*( pt :pointer ) {.importc: "_mm_free".}
guzba commented 6 months ago

My default position is to be supportive of adding any intrinsic found on the Intel Intrinsics Guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html so adding those seems reasonable to me.

Asc2011 commented 6 months ago

Thats fine with me. I had tried to add some tags like BM1/BM2/F16C just to get a feeling how the nim-package works. I don't now if these are still important. I've forked nimsimd and added a branch 'ASC' where i added the changes i mentioned :

One more thing - i started smth. as already mentioned - its a RLU-Cache with vector-operation. I made me a common_avx2.nim for dev and to learn the intrinsics. I'll put it into the 'ASC-branch' - it's rather the workbench-idea - i doubt the nim-generics will make the code faster ;) but readabillity is much better now.. Ahh and the mm_prefetch-thing is smth. for people who really know what they are doing and interesting but a last resort. The SIMD-champ from algorithmica has a example where he usesa prefetch-instruction during tree-walking ...

greet & beats, Andreas

Asc2011 commented 2 months ago

For the moment, i dropped the common_avx2.nim-idea. It requires some sophisticated macro-solution - so it does not matter for know.

For my part, this issue can be closed, since all important parts aligned-(mm_alloc, mm_free, mm_prefetch, mm_fence) are covered by pull-request-25.

greets andreas