Open phadej opened 3 years ago
Something like
diff --git a/src/Data/Text/Encoding.hs b/src/Data/Text/Encoding.hs index 239e15e..9b6a01c 100644 --- a/src/Data/Text/Encoding.hs +++ b/src/Data/Text/Encoding.hs @@ -436,7 +436,10 @@ encodeUtf8 (Text arr off len) fp <- mallocByteString (len*3) -- see https://github.com/haskell/text/issues/194 for why len*3 is enough withForeignPtr fp $ \ptr -> with ptr $ \destPtr -> do - c_encode_utf8 destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len) + -- When ByteArray# is large enough, it is pinned. + -- Then we can use safe FFI calls, which we indeed want for big inputs! + (if len > 4096 then c_encode_utf8_safe else c_encode_utf8_unsafe) + destPtr (A.aBA arr) (fromIntegral off) (fromIntegral len) newDest <- peek destPtr let utf8len = newDest `minusPtr` ptr if utf8len >= len `shiftR` 1 @@ -535,5 +538,8 @@ foreign import ccall unsafe "_hs_text_decode_utf8_state" c_decode_utf8_with_stat foreign import ccall unsafe "_hs_text_decode_latin1" c_decode_latin1 :: MutableByteArray# s -> Ptr Word8 -> Ptr Word8 -> IO () -foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8 +foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8_unsafe + :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO () + +foreign import ccall safe "_hs_text_encode_utf8" c_encode_utf8_safe :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()
The benchmark suite doesn't have exercises for multithreaded programs.
As far as I understand the code now, encoding or decoding big Text will prevent GC from running. Is that an issue, hard to tell.
Text
E.g. cryptohash-sha256 branches on size: https://hackage.haskell.org/package/cryptohash-sha256-0.11.102.0/docs/src/Crypto.Hash.SHA256.html#c_sha256_update, (my diff above is probably written better in that way too)
cryptohash-sha256
Cf. https://github.com/haskell/bytestring/issues/282
Something like
The benchmark suite doesn't have exercises for multithreaded programs.
As far as I understand the code now, encoding or decoding big
Text
will prevent GC from running. Is that an issue, hard to tell.E.g.
cryptohash-sha256
branches on size: https://hackage.haskell.org/package/cryptohash-sha256-0.11.102.0/docs/src/Crypto.Hash.SHA256.html#c_sha256_update, (my diff above is probably written better in that way too)