lemire / despacer

C library to remove white space from strings as fast as possible
BSD 3-Clause "New" or "Revised" License
151 stars 14 forks source link

despace_ssse3_lut_1kb: faster comparison #11

Closed kloetzl closed 5 years ago

kloetzl commented 5 years ago

The original comparison can be improved by shaving off a few instructions and cycles. The unit tests run through. A similar method could also be used for most other functions, given my method is correct and I didn't miss an edge case.

ps. Thanks for this repo. The code is really helpful!

lemire commented 5 years ago

Thanks. What happens if a character with byte value 64 is included... do you expect your approach to work?

kloetzl commented 5 years ago

Unless I am missing something, that will work just fine. Consider the string "@b c" aka. 0x40622063. After the shuffle that becomes 0x20002000. Comparing that to the original string gives 0x0000FF00.

It is basically a simplification over what you use in simdjson that works as long as all characters of interest have differing low nibbles.

aqrit commented 5 years ago

That's a neat trick!

btw, here is the despacer I was fiddling with last week https://gist.github.com/aqrit/6e73ca6ff52f72a2b121d584745f89f3#file-despace-cpp-L141

lemire commented 5 years ago

I'm sold, merging.

kloetzl commented 5 years ago

That's a neat trick!

Thanks! And thanks for merging! I haven't yet tried to extend the method to any of the other functions. Might be worth it.