Open Asc2011 opened 9 months ago
My default position is to be supportive of adding any intrinsic found on the Intel Intrinsics Guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html so adding those seems reasonable to me.
Thats fine with me. I had tried to add some tags like BM1/BM2/F16C just to get a feeling how the nim-package works. I don't now if these are still important. I've forked nimsimd and added a branch 'ASC' where i added the changes i mentioned :
mm_malloc
/mm_free
and maybe mm_fence
which i have added to /nimsimd/sse2.nim
. Those work for me - not the fence yet, but the alloc/free. I use the aligned malloc which according to The Intel Intrinsics Guide is around since SSE1 - stoneage. The Guide is massive and there are 'hidden' sections like 'Other' - where the three tags BMI1/BMI2/F16C were sitting :))
I've tried to use CPUID to read the Cache-line-size and after some peeking and pokeing i see a value that is correct for my machine - but maybe just a coincidence. Besides Felix Cloutiers reference - i found another src, Rust-version of CPUID with different legs and numbers - so this is all a bit shaky. I've added the CPUID for
BMI1/BM12/F16C and CMPXCH16 according to the docs from Felix Cloutiers.
Beeing able to (reliably) receive the cacheline size is a plus. The Compare-Exchange-16-Byte never gained support from the compiler-side. Its technically avail. since Haswell (2015).
So i hope i've not messed up too much :) I'm a bit in a mess, thats why i pushed and pull these changes via the github webfrontend - not excatly knowing what i'm doing.. :)) Anyways, pick what you think makes sense. The aligned alloc via mm_malloc
make sense to me.One more thing - i started smth. as already mentioned - its a RLU-Cache with vector-operation. I made me a common_avx2.nim
for dev and to learn the intrinsics. I'll put it into the 'ASC-branch' - it's rather the workbench-idea - i doubt the nim-generics will make the code faster ;) but readabillity is much better now..
Ahh and the mm_prefetch
-thing is smth. for people who really know what they are doing and interesting but a last resort. The SIMD-champ from algorithmica has a example where he usesa prefetch-instruction during tree-walking ...
greet & beats, Andreas
For the moment, i dropped the common_avx2.nim
-idea. It requires some sophisticated macro-solution - so it does not matter for know.
For my part, this issue can be closed, since all important parts aligned-(mm_alloc
, mm_free
, mm_prefetch
, mm_fence
) are covered by pull-request-25.
greets andreas
Hi again,
_mm_alloc()
and_mm_free()
should be insse.nim
which is missing. I'd say this would be a desireable feature as it would make aligned allocations explicit ? And easier and cleaner, too.What do you think ?
regards, Andreas
-- i found the include for
prefetch
insse2.nim
and added the desired for testing :