haskell / bytestring

An efficient compact, immutable byte string type (both strict and lazy) suitable for binary or 8-bit character data.
http://hackage.haskell.org/package/bytestring
Other
286 stars 137 forks source link

Create strict bytestrings from FixedPrim with zero copy #666

Open lykahb opened 3 months ago

lykahb commented 3 months ago

To turn a FixedPrim into a strict ByteString, with the current public interface I can do something like this:

myTypePrim :: FixedPrim MyType

myTypeToLazyByteString :: MyType -> BL.ByteString
myTypeToLazyByteString =
    -- Some kind of allocation strategy to create a chunk of desired size
    BB.toLazyByteStringWith (BB.untrimmedStrategy 36 BB.defaultChunkSize) mempty
  . BBP.primFixed myTypePrim

-- Makes a copy with memcmp
myTypeToStrictByteString :: MyType -> BS.ByteString
myTypeToStrictByteString = toStrict . myTypeToLazyByteString

Once primFixed :: FixedPrim a -> a -> Builder converts a value to Builder, there are only utilities to convert it into a lazy bytestring.

One alternative is to use runF :: FixedPrim a -> a -> Ptr Word8 -> IO () together with create :: create :: Int -> (Ptr Word8 -> IO ()) -> IO ByteString but that relies on two internal modules Data.ByteString.Builder.Prim.Internal and Data.ByteString.Internal.

I propose three solutions:

  1. Create function Builder -> Ptr Word8 -> IO ()
  2. Export runF :: FixedPrim a -> a -> Ptr Word8 -> IO () from the public module Data.ByteString.Builder.Prim.
  3. Create function primToByteString :: FixedPrim a -> a -> ByteString that creates a strict bytestring right away.

For both 1 and 2 the code would rely on the create from the semi-public module Data.ByteString.Internal.

For the context, the benchmarking in https://github.com/haskell-hvr/uuid/pull/80 shows that the overhead of copy from toStrict slows down the conversion by 40% (28ns vs 20ns).

lykahb commented 3 months ago

If you decide on a solution, I can implement and and submit it.

clyring commented 3 months ago
lykahb commented 3 months ago

Thanks for digging into the the performance of the the particular code. After looking at the Builder internals I agree that 1 wouldn't work out.

What do you think about doing both 2 and 3?

A value of Ptr Word8 -> IO () is nice for composition. And 3 has a straightforward signature and solves the tradeoff between extra allocation and toStrict conversion on one side, and importing two internal modules on the other side.

clyring commented 3 months ago

My main complaint with solution 2 is that I don't think it solves a real problem. If you have need of it, feel free to import runF from Data.ByteString.Builder.Prim.Internal, which I think is a good home for this function. The bytestring maintainers take stability seriously for every exposed module, regardless of whether "Internal" happens to appear in its name.

lykahb commented 3 months ago

This makes sense. I'm going to use the internal modules. It would be nice to remove the disclaimer, as you suggested. It gives a different impression about stability than you described.

Would you be open to a PR that implements 3?

clyring commented 3 months ago

Fire away!