shepmaster / jetscii

A tiny library to efficiently search strings for sets of ASCII characters and byte slices for sets of bytes.
Apache License 2.0
113 stars 20 forks source link

Use _mm_extract_epi16 to extract the mask #25

Closed ghost closed 6 years ago

ghost commented 6 years ago

(V)PCMPESTRM places a mask word into XMM0. We only need to extract the low order 16 bits, and not the low order 32 bits.

There doesn't appear to be an impact on the compiled assembly; however, PEXTRD is an SSE4.1 feature whereas PEXTRW is SSE2.


I hereby license this contribution under the dual MIT/Apache-2.0 license, allowing licensees to choose either at their option.

shepmaster commented 6 years ago

PEXTRD is an SSE4.1 feature whereas PEXTRW is SSE2.

Isn't PCMPESTRM SSE4.2 anyway, so this point is rather moot? I still like it because it's more closely aligned to the desired semantics,

ghost commented 6 years ago

Yes, PCMPESTRM is SSE4.2, so the SSE2 vs. SSE4.1 difference is likely moot. One question in my mind is whether a processor supporting SSE4.2 is guaranteed to support the earlier versions SSE4.1, SSE3, SSE2, and SSE. From cursory research, the answer seems to be "yes"; however, I have yet to see where this is guaranteed:

shepmaster commented 6 years ago

whether a processor supporting SSE4.2 is guaranteed to support the earlier versions

If we wanted to be paranoid, we could require all of the intrinsic families we need. I expect a failure here will end in a compilation error though, so I'm willing to wait for that.