haskell / filepath

Haskell FilePath core library
BSD 3-Clause "New" or "Revised" License
66 stars 33 forks source link

Fix isInfixOf and breakSubString in Word16 #199

Closed hasufell closed 12 months ago

hasufell commented 1 year ago

Fixes #195

@Bodigrim @sol

hasufell commented 1 year ago

We repeatedly use Data.ByteString.Short.breakSubstring and Data.ByteString.Short.splitAt which both copy. See: #199 (comment)

I think we will get more predictable performance characteristics if we convert to a ByteString first, calculate the offset, and then do a final splitAt on the original input.

@Bodigrim any more thoughts on this?

I don't think I agree.

The point of ShortByteString is that it doesn't use pinned memory and doesn't contribute to memory fragmentation. Even if the ByteString here will be temporary, I find it morally wrong to use it inside the ShortByteString module.

Bodigrim commented 1 year ago

As discussed in https://github.com/haskell/bytestring/issues/307, I'm not convinced that the algorithm behind Data.ByteString.{,Short.}.breakSubstring is any better than memcmp.

Looking at the difficulties with appending/slicing of ShortByteString to avoid quadratic performance, I think running compareByteArrays# in a loop with even indices is the simplest and most robust solution here.

hasufell commented 1 year ago

@Bodigrim done