Closed Asc2011 closed 1 month ago
Hi and thanks for the PR. Taking a quick look, one thing I noticed is the use of static pointer
. I think that is not the same thing as void const * ptr
in the Intrinsics Guide https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=4243,5873,4037&othertechs=MOVBE
A static pointer would need to be known at compile time which I do not see as compatible with heap memory and the use of these operations.
I took a closer look this time and did notice a couple things to check on:
_bextr_u64
should have start
and len
as unsigned int
in the intrinsics docs, as opposed to 64-bit parameters_bzhi_u32
and _bzhi_u64
index
parameter looks like uint32mulx_32
intrinsic and name should be _u32
, same for 64 _u64
, these also have the 3rd parameter as a pointer in the intrinsicIt will be great to have a bunch more instruction sets available when we've got them looking good.
ok, BMI1 & BMI2 should be fine now. thx for the review.
greets and have a nice weekend :)
Hi guzba,
i've noticed that the something did not pass the GH-pipeline ? Is there anything i can do ?
Greets Andreas
Hello and thanks again for this PR. There were still some issues here so instead of continuing to tweak this over time, which may be time consuming, I chose to extend these commits in a new PR and do the remaining fixes here https://github.com/guzba/nimsimd/pull/26
Note that I did remove the queryCacheLineSize call due to the implementation not looking correct to me (should be an unsigned shift, I think incorrect register used, need to check a feature flag to see if the data is present and valid, etc). More info here https://www.felixcloutier.com/x86/clflush if you'd like to work on that in a separate PR.
Very nice and thx for including the important tags.
Regarding the detection of the CacheLineWidth in queryCacheLineSize()
. I somehow remember that the leg from the Felix Cloutiers' documentation gave me unreasonable results. I then looked at a CpuId-implementation in Rust and found that they used a different leg. So i tried that one, but only had one machine to test against. So that may well be not correct. If i understood it right, then on Intel-Cpus its supposed to be always 64-Byte - thats easy. On AppleSilicon (M1 up to M-2024) its supposed to be 128-Byte - thats seems easy, too. To do the detection on any other CPU (Arm, RISC-V, Power, etc.) might get hairy.
Maybe it makes more sense to make this a compile-time-option, which defaults to 64, but can be set via -d:cacheline=128
? Since i've never heard of a CacheLineWidth >128
or <64
.
Anyways, i'll retry in pullreq-26.
If you have a different source for how to get the cache flush size, that'd be totally fine with me. I don't mean to imply the website I linked is the only authority. If you have a solid one, please just point me to it so I can properly ensure I learn and agree the implementation is correct and ready to go.
Hello Guzba,
this pull-request adds some new Tags :
Moste of the functionality of BMI 1+2 is already in
std/bitops
. But for demo-purposes it might be ok, to have them explicit here. More important to me is the addition of aligned-allocations and prefetching tonimsimd/sse2.nim
:I've used the aligned-allocations in two projects of mine. So consider them tested and usable, except the
sfence()
-thing. Some functions were missing fromnimsimd/avx2.nim
. According to the Intel-Intrinsics-Guide, they should be avail.The CacheLineWidth-detection works for me. I have not found a similar way to inquire this info for Arm or AppleSilicon or RISC-V cpus. Its just a starting point. Since i believe the CacheLineWidth to be the crucial-information when dealing with SIMD, i'd love to see this information availiable for all commonly cpu-architecture used by nim ? What do you think ? Same could be said for the L1,L2,L3-sizes.
best regards, Andreas (a.k.a. bosinski)