Closed Freaky closed 1 year ago
I made the same change in my C port and the numbers are still right on my terrible excuse for a benchmark.
This would be made slightly more obvious with _MM_SHUFFLE, which would show it's retrieving the lower 32 bits of the permutation (2,2,2,2). Sadly it's nightly-only.
Thanks, I checked the benchmark and it looks 👍.
This replaces SSE4.1
_mm_extract_epi32
with SSE2_mm_cvtsi128_si32
and_mm_shuffle_epi32
.Should fix #85 and doesn't appear to break anything else.