llogiq / bytecount

Counting occurrences of a given byte or UTF-8 characters in a slice of memory – fast
Apache License 2.0
225 stars 27 forks source link

Avoid use of SSE4.1 intrinsic for SSE2 #86

Closed Freaky closed 1 year ago

Freaky commented 1 year ago

This replaces SSE4.1 _mm_extract_epi32 with SSE2 _mm_cvtsi128_si32 and _mm_shuffle_epi32.

Should fix #85 and doesn't appear to break anything else.

Freaky commented 1 year ago

I made the same change in my C port and the numbers are still right on my terrible excuse for a benchmark.

Freaky commented 1 year ago

This would be made slightly more obvious with _MM_SHUFFLE, which would show it's retrieving the lower 32 bits of the permutation (2,2,2,2). Sadly it's nightly-only.

llogiq commented 1 year ago

Thanks, I checked the benchmark and it looks 👍.