lens / lens-aeson

Traversals and Prisms for Data.Aeson
MIT License
51 stars 18 forks source link

IsKey ByteString instance is partial #48

Closed phadej closed 2 years ago

phadej commented 2 years ago

In

strictTextUtf8 :: Iso' Text.Text Strict.ByteString
strictTextUtf8 = iso StrictText.encodeUtf8 StrictText.decodeUtf8

decodeUtf8 is partial, it errors if the encoding is not correct.

*Data.Aeson.Lens> :set -XOverloadedStrings 
*Data.Aeson.Lens> :t view _Key ("\255" :: ByteString)
view _Key ("\255" :: ByteString) :: Key
*Data.Aeson.Lens> view _Key ("\255" :: ByteString)
"*** Exception: Cannot decode byte '\xff': Data.Text.Internal.Encoding.streamDecodeUtf8With: Invalid UTF-8 stream

I'm not sure what would be the best way to proceed:

  1. Removing instances is a breaking change
  2. making them prisms is difficult
  3. Using decodeUtf8Lenient would at least remove error (though not make it a true Iso)

I'm leaning to go 1. for optics-aeson (2. is also possible), but lens-aeson could do 3.?


EDIT: strictTextUtf8 is used in _Value too, but there it's used on aeson produced bytestrings, which should be valid UTF8, so it's "safe".

RyanGlScott commented 2 years ago

My inclination would be to adopt option (2), since lens defines a similar utf8 function as a Prism. Fortunately, we don't export strictTextUtf8, so we can change its implementation as we see fit.

RyanGlScott commented 2 years ago

Sorry, I didn't read this closely enough when I wrote my original comment. Option (2) isn't feasible for _Key, since its type requires a full-blown Iso, not just a Prism.

I mainly added the ByteString instances as a convenience for interoperability with aeson, so I suppose I did have UTF-8 in mind. In that case, I propose that we add Haddocks to the instances stating that they assume the invariant that the ByteStrings are UTF-8–encoded. We could also switch from using decodeUtf8 to decodeUtf8Lenient, I suppose, but either way, you'd get unexpected results if you use something that isn't UTF-8–encoded.

phadej commented 2 years ago

I'd favor changing to decodeUtf8Lenient for _Key (for _Value it doesn't really matter, as that shouldn't ever fail). Documenting that it's not full Iso is :+1:.

RyanGlScott commented 2 years ago

See #50.