Open Bodigrim opened 7 months ago
Our primitives rely on the TextEncoding
modules in base and use peekCStringLen
etc.
See:
-- | Decode with the given 'TextEncoding'.
decodeWithTE :: TextEncoding -> BS8.ShortByteString -> Either EncodingException String
decodeWithTE enc ba = unsafePerformIO $ do
r <- try @SomeException $ BS8.useAsCStringLen ba $ \fp -> GHC.peekCStringLen enc fp
evaluate $ force $ first (flip EncodingError Nothing . displayException) r
-- | Encode with the given 'TextEncoding'.
encodeWithTE :: TextEncoding -> String -> Either EncodingException BS8.ShortByteString
encodeWithTE enc str = unsafePerformIO $ do
r <- try @SomeException $ GHC.withCStringLen enc str $ \cstr -> BS8.packCStringLen cstr
evaluate $ force $ first (flip EncodingError Nothing . displayException) r
The encoders/decoders API don't work well with non-String afair: https://hackage.haskell.org/package/base-4.19.0.0/docs/GHC-IO-Encoding.html
Because e.g. TextEncoder is fixed to char: type TextDecoder state = BufferCodec Word8 CharBufElem state
.
How do you propose we get the API with TextEncoding
for free and avoid intermediate representations? Can we not just rely on list fusion or so?
Can we not just rely on list fusion or so?
Given that PFP.decodeUtf
is monadic, all effects must happen before we proceed to the next line. This almost certainly prevents any list fusion.
I suggest to write toWindowsPath
and toPosixPath
manually, without reliance on GHC.IO.Encoding
. UTF-8 to UTF-16 and back conversion is reasonably simple.
That sounds hard. Can we cry for help?
https://github.com/haskell/tar/pull/88 introduces
IMHO such utilities should better be provided by
filepath
itself, ideally optimized to a single pass without any intermediate structures.