Open Fuuzetsu opened 10 years ago
I'm in favor of adding and documenting such an instance. Adding this newtype would be a fine alternative as well.
Also, credit where credit is due: I believe it was snoyberg who wrote ByteString64
, for our uses at FP Complete.
I don't think we should reintroduce the ByteString instances with base64 implementations. It would really confuse people to first have a text-based ByteString instance, then no instance, then a base64-based instance.
I agree that it would be nice to have a Base64 newtype. However, I don't think aeson is the right library to provide this newtype because I can imagine you also want to use this type for other instances like postgresql-simple's To/FromField.
It's better to export this newtype from its own package together with instances for aeson and other libraries.
Hm, fair enough, but I imagine aeson maintainer should make the package as they'll be able to update it with any aeason changes accordingly.
I would take a patch to base64-bytestring that added a Base64
newtype, but obviously that (along with a little supporting machinery) is all the patch could introduce; it could not add a dependency on aeson.
How about putting the newtype in base64-bytestring and adding dependency on base64-bytestring in aeson which can then provide instances?
I think it's either that or creating a package solely for the instances (while still adding the newtype to base64-bytestring). What do you think?
I now think introducing a type, whose only purpose it is to change the encoding in some medium, is the wrong way to go. Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium. If we would represent binary data using a Base64 newtype we need to wrap and unwrap it at every place we are constructing and deconstructing binary data. This is annoying and obscures the fact that we are just working with binary data. I think it's better if the encoding of some Haskell data type in some particular medium is described in the particular encoding function which in this case is toJSON/parseJSON. This also has the benefit that we can use different encodings for different mediums (for example you can store binary data directly into a PostgreSQL database without first encoding it to base64).
We could introduce some simple helper functions (maybe in base64-bytestring) which would simplify the encoding/decoding in text-based mediums like:
encodeToText :: ByteString -> Text
encodeToText = TE.decodeUtf8 . Base64.encode
decodeFromText :: (Monad m) => Text -> m ByteString
decodeFromText = either fail return . Base64.decode . TE.encodeUtf8
These can be used as in:
data SomeRecord = SomeRecord
{ field1 :: Int
, field2 :: ByteString
, field3 :: Bool
}
instance ToJSON SomeRecord where
toJSON r = object [ "field1" .: field1 r
, "field2" .: Base64.encodeToText (field2 r)
, "field3" .: field3 r
]
instance FromJSON SomeRecord where
parseJSON = withObject "SomeRecord" $ \obj ->
SomeRecord <$> obj .: "field1"
<*> ((obj .: "field2") >>= Base64.decodeFromText)
<*> obj .: "field3"
One disadvantage of this approach is that we can't currently use these functions when automatically deriving ToJSON/FromJSON instances using GHC Generics or Template Haskell. I think the best way to solve that is to introduce codec overwrites as in something like:
data Options = Options
{ ...
, fieldCodecs :: [(String, Codec)]
...
}
data Codec = forall a. (Typeable a) => Codec
{ codecEncode :: a -> Value
, codecDecode :: Value -> Parser a
}
base64Codec :: Codec
base64Codec = Codec
{ codecEncode = toJSON . Base64.encodeToText
, codecDecode = withText "base64" Base64.decodeFromText
}
deriveJSON
defaultOptions{fieldCodecs = [("field2", base64Codec)]}
''SomeRecord
This could be implemented rather efficiently for the Template Haskell encoder but probably not so efficiently for the GHC Generics encoders since they need to do a lookup in the fieldCodecs association for each field dynamically.
The implementation of this does require the use of cast but hopefully that's not too inefficient.
FWIW, I have a ByteString64
data type in our codebase with exactly this purpose. I'd be +1 on having that functionality exported from either base64-bytestring or aeson.
FWIW we also have ByteString64
.
OTOH @basvandijk's could be interesting, but I'd rather do it only for template-haskell
derivations first.
+1 on having this instance exported from aeson, a dependency on base64-bytestring is acceptable.
I don't have an opinon on this. It looks like there are 4 votes for adding this newtype and one against. Another option is to add this instance to a blessed orphans package (#432). OTOH the dependency is very light. Sounds like the first step should be to add this newtype to base64-bytestring no matter how we choose to proceed.
I ran into this issue yesterday. Comments and observations.
@basvandijk writes "Ideally types tell us how values should be used on the Haskell side, not how they are encoded in some particular medium." I would like to see a larger justification for this statement, because it runs against much that I hold to be true.
In any case, I do not think it applies in this instance. I am wrapping a newtype around my Bytestring precisely because I want the encoding/decoding of my data to be fully encapsulated in the aeson process. Otherwise, I have to add an extra step on either the client or the server side everywhere I use that type. This would be particularly onerous because the uses are asymmetrical. On the server side, compiling with ghc, this bytestring gets read from the file system. On the client side, compiling with ghcjs, it gets converted to a blob. I have used additional encoding/decoding functions as you suggest, in fact, I have those exact ones, and they are error prone, becuse they do not encode he semantics of the encoding. They are tricky to maintain and difficult for othesr to understand. It exists in our production system and it causes me dread every time I get near it.
I will give the newtype solution a try today to see if I run into anything ugly.
AFAIK the "canonical" solution to this for last few years is ByteString64
newtype though I would still like it to live inside aeson as ByteString
instance as it's often a hassle when you find out late into existing codebase that you'd like JSON instances and you end up rewrapping throughout whole codebase: nothing major but inconvenient.
@Fuuzetsu IMHO preferring single encoding is wrong a choice. Someone might as well have data in base16 (hex) too. It's a bit like Sum
& Product
for Monoid
.
Wrapping business can be solved independently of aeson
. (I can imagine using generics-sop
Code
s + Coercible
to do oneliner conversion).
if you want different encoding then you can use newtype, this issue is about having a default
While thinking about this again I would like to slightly adapt my previous opinion on the matter:
Usually when you have a string of binary data somewhere in your record it represents a particular thing, i.e.: an image, blob, hash, ASCII text, etc.. In those cases wouldn't it be better to wrap that string of bytes with a specific newtype telling your users what the bytes mean?
So I guess I'm advocating using a dedicated newtype for each type of bytestring instead of using a ByteString
or ByteString64
directly. For example, instead of having:
data MyRecord = MyRecord
{ ...
, blob :: ByteString
, hash :: ByteString
, ...
}
You should have:
data MyRecord = MyRecord
{ ...
, blob :: Blob
, hash :: Hash
, ...
}
where
newtype Blob = Blob ByteString
instance ToJSON Blob where ...
instance FromJSON Blob where ...
newtype Hash = Hash ByteString
instance ToJSON Hash where ...
instance FromJSON Hash where ...
In the latter it wouldn't hurt using:
newtype Blob = Blob ByteString64 deriving (ToJSON, FromJSON)
newtype Hash = Hash ByteString64 deriving (ToJSON, FromJSON)
to get the JSON instances for free.
What do you folks think about this? Would this be to much hassle in practise?
it's no different than blob :: ByteString64, hash :: ByteString64
except now you need an instance for every one of your wrappers (Blob, Hash, …) that you have to write rather than having 1 instance. So it seems strictly worse from the "amount of code" perspective. Of course knowing we can't swap Blob and Hash around by accident has its own merits but unsure if we can use that as an argument here…
Yes you do get an extra newtype wrapper W (newtype W = W ByteString64 deriving (ToJSON, FromJSON)
) for every different type of bytestring which indeed causes more code.
BTW I'm not opposing providing a ByteString64
newtype. Exporting it somewhere from base64-bytestring
would be nice.
For the record, @phadej has uploaded his base64-bytestring-type
package to Hackage.
@bgamari well spotted!
...or good stalking ;-)
@basvandijk Might i miss something but type says:
encode :: ToJSON a => a -> ByteString
-- ByteString
and then:
No instance for (ToJSON ByteString)
How to deal with it without workarounds with generics, implement own instance?
Regards
@djleonskennedy, can you rephrase the question? I'm not sure I follow.
ByteString instances were removed because they allowed people to try and insert arbitrary binary data that JSON can't handle.
Could we not re-add ByteString instances that leveraged something like base64 encoding (which JSON can take I believe) to work around this limitation?