soenkehahn / string-conversions

Simplifies dealing with different types for strings
BSD 3-Clause "New" or "Revised" License
26 stars 6 forks source link

Provide strict conversion from ByteString to Text #7

Closed mitchellwrosen closed 8 years ago

mitchellwrosen commented 8 years ago

Currently all ByteString -> Text conversions use Data.Text.Encoding.Error.lenientDecode. However, some users might prefer the UnicodeException be thrown instead.

So, I propose an alternative module Data.Conversions.String.Strict, which is a drop-in replacement for Data.Conversions.String, but uses Data.Text.Encoding.Error.strictDecode in the ByteString -> Text instances.

soenkehahn commented 8 years ago

I'm not against adding something that allows strict decoding. Some thoughts:

validate :: ByteString -> Either UnicodeException ByteString

unsafeValidate :: ByteString -> ByteString

This would also compose with Data.Conversions.String.Monomorphic.

Maybe you could elaborate on your exact use-case(s). I have never had the desire to decode anything strictly. That could help to narrow down the design space. Would validate / unsafeValidate even work for you?

soenkehahn commented 8 years ago

@mitchellwrosen: Btw, I just realized in shock that this project doesn't have a test-suite. I added one in #8.

mitchellwrosen commented 8 years ago

My use case is simply wanting to hear invalid utf-8 decodings loudly, whether via an async exception or an Either Bad Good is a matter of taste.

But, in talking with a coworker, I think in this case we're simply going to write our own string conversion typeclass and leave out all ByteString -> Text instances. The programmer will just have to reach for decodeUtf8 manually in these cases.

mitchellwrosen commented 8 years ago

I think I misspoke, another coworker just informed me that ByteString -> String conversions are also unprincipled, as they assume ASCII encoding, and strip wider chars to 8 bits.

soenkehahn commented 8 years ago

Hmm, ideally ByteString -> String would behave similarly to ByteString -> Text when it comes to decoding failures. Thanks for the pointer.

I take it that you're not interested in having this feature anymore, so I'll close. Feel free to reopen, if you want this.