Create strict bytestrings from FixedPrim with zero copy

lykahb commented 3 months ago

To turn a FixedPrim into a strict ByteString, with the current public interface I can do something like this:

myTypePrim :: FixedPrim MyType

myTypeToLazyByteString :: MyType -> BL.ByteString
myTypeToLazyByteString =
    -- Some kind of allocation strategy to create a chunk of desired size
    BB.toLazyByteStringWith (BB.untrimmedStrategy 36 BB.defaultChunkSize) mempty
  . BBP.primFixed myTypePrim

-- Makes a copy with memcmp
myTypeToStrictByteString :: MyType -> BS.ByteString
myTypeToStrictByteString = toStrict . myTypeToLazyByteString

Once primFixed :: FixedPrim a -> a -> Builder converts a value to Builder, there are only utilities to convert it into a lazy bytestring.

One alternative is to use runF :: FixedPrim a -> a -> Ptr Word8 -> IO () together with create :: create :: Int -> (Ptr Word8 -> IO ()) -> IO ByteString but that relies on two internal modules Data.ByteString.Builder.Prim.Internal and Data.ByteString.Internal.

I propose three solutions:

Create function Builder -> Ptr Word8 -> IO ()
Export runF :: FixedPrim a -> a -> Ptr Word8 -> IO () from the public module Data.ByteString.Builder.Prim.
Create function primToByteString :: FixedPrim a -> a -> ByteString that creates a strict bytestring right away.

For both 1 and 2 the code would rely on the create from the semi-public module Data.ByteString.Internal.

For the context, the benchmarking in https://github.com/haskell-hvr/uuid/pull/80 shows that the overhead of copy from toStrict slows down the conversion by 40% (28ns vs 20ns).

lykahb commented 3 months ago

If you decide on a solution, I can implement and and submit it.

clyring commented 3 months ago

toStrict does not incur any copying unless the LazyByteString has more than one chunk, which should never happen in your use case. The overhead you observe is probably the cost of allocating the intermediate LazyByteString's spine, i.e. the Chunk object itself.
Solution 1 doesn't make much sense to me. I'd expect in most cases where an upper bound on the size of a Builder's output is available, that upper bound is small enough that it makes sense to provide a BoundedPrim () for the same.
Solution 2: From my perspective as a maintainer, Data.ByteString.Builder.Prim.Internal does something very sensible and is very stable. The little "may not be exposed at all in future releases" warning at the top should probably just be deleted, and the module treated as first-class public API like Data.ByteString.Internal.
- (The warning in Data.ByteString.Builder.Internal is a bit less misplaced; there are several internals-breaking changes to the design of non-Prim Builders that could be explored in the future... with a major version bump, of course.)
Solution 3's idea is fine by me, though I'd expect both fixedPrimToStrictByteString and boundedPrimToStrictByteString.

lykahb commented 3 months ago

Thanks for digging into the the performance of the the particular code. After looking at the Builder internals I agree that 1 wouldn't work out.

What do you think about doing both 2 and 3?

A value of Ptr Word8 -> IO () is nice for composition. And 3 has a straightforward signature and solves the tradeoff between extra allocation and toStrict conversion on one side, and importing two internal modules on the other side.

clyring commented 3 months ago

My main complaint with solution 2 is that I don't think it solves a real problem. If you have need of it, feel free to import runF from Data.ByteString.Builder.Prim.Internal, which I think is a good home for this function. The bytestring maintainers take stability seriously for every exposed module, regardless of whether "Internal" happens to appear in its name.

lykahb commented 3 months ago

This makes sense. I'm going to use the internal modules. It would be nice to remove the disclaimer, as you suggested. It gives a different impression about stability than you described.

Would you be open to a PR that implements 3?

clyring commented 3 months ago

Fire away!

haskell / bytestring

Create strict bytestrings from FixedPrim with zero copy #666