haskell / bytestring

An efficient compact, immutable byte string type (both strict and lazy) suitable for binary or 8-bit character data.
http://hackage.haskell.org/package/bytestring
Other
291 stars 140 forks source link

elemIndexEnd non-optimally implemented in terms of findIndexEnd #278

Open archaephyrryx opened 4 years ago

archaephyrryx commented 4 years ago

The current implementations of the elemIndexEnd function in modules Data.ByteString and Data.ByteString.Lazy are uniformly defined as findIndexEnd . (==) (the Char8 version merely invokes the strict bytestring definition after converting from Char to Word8).

This seems counterintuitive when compared with elemIndex, which is more optimized than findIndex through the use of the memchr C FFI call to avoid costly byte-by-byte predicate testing.

There exists a GNU extension for string.h that defines an operation memrchr that performs a similar operation to memchr but returns the final occurrence of a byte rather than the first, which could be used when available. Even without such platform-specific optimizations, it should still be possible to either add a memrchr-like function to the cbits code.

Even without FFI calls, elemIndexEnd could use the same logic as findIndexEnd and perform a byte-by-byte direct equality test at least as efficiently as findIndexEnd . (==), but without the indirection.

Bodigrim commented 4 years ago

The implementation of elemIndexEnd was recently discussed here: https://github.com/haskell/bytestring/pull/155#issuecomment-628619601 It seems that thanks to inlining there is no performance penalty.

AFAIU Windows systems do not provide memrchr, so the only option to remain cross-platform is to implement memrchr in cbits. I do not mind against such approach, if supported by benchmarks.