well-typed / cborg

Binary serialisation in the CBOR format
https://hackage.haskell.org/package/cborg
190 stars 86 forks source link

export a schema for serialising to/from other languages #89

Open ghorn opened 8 years ago

ghorn commented 8 years ago

I tried serialising a haskell product type with records and deserialising it in python, and I got an array of objects instead of a dictionary. This makes sense: there is no reason to waste space encoding record keys for static data.

It would be useful to generate a bit of python code describing the structure and record names of the haskell data, which would let python efficiently deserialise into a native dictionary. I think I could probably hack something together pretty easily by reverse-engineering the generic-deriving parts of this package, but any tips on doing it the right way would be appreciated.

I think I can find some time to do this for work. I'm also interested in doing something similar for C/C++

dcoutts commented 8 years ago

@ghorn I think we'd probably want a separate type class. And then have that one use that style.

ghorn commented 8 years ago

OK. I have code for generating C++ structs from Haskell data, and I generate C++ code which converts these structs to and from std::vector<uint8_t> in a way that is compatible with Serialise. I also generate C++ tests, and some utility functions that mimic Show and Eq.

The structs are templated to match Haskell type parameters, and I support type families.

I'm gonna burn this is a while in my company codebase before I release it, unless there is immediate interest for it.

thoughtpolice commented 7 years ago

Greg, something like this would be really neat to see upstream or in public, at least. Do you think you have any chance of at least sketching out what you'd want to see from this package for help? Can you give an idea of what your current solution looks like?

I'm not sure if adding a bunch of code generators or whatever is worth it (unless they're light and easy to maintain). But I'd be very willing to add the scaffolding so that others can add their own backends...

bgamari commented 7 years ago

Austin Seipp notifications@github.com writes:

Greg, something like this would be really neat to see upstream or in public, at least. Do you think you have any chance of at least sketching out what you'd want to see from this package for help? Can you give an idea of what your current solution looks like?

I agree! This would be great.

ghorn commented 7 years ago

Can you give an idea of what your current solution looks like?

I have something like

toCxxType :: ToCxxType a => Proxy a -> CxxType

and

generateCxxCode :: [CxxType] -> [(FilePath, String)]

ToCxxType is currently derived through TH because GHC.Generics only goes up to * -> * and I need higher-kinded types.

I take a list of CxxTypes and do a topological sort, and then turn them into templated C++ classes whose template parameters match the Haskell type parameters. The classes supported are pretty much the same ones binary-serialise-cbor supports, and I have extra magic for C++ enum classes, sum types with mapbox::variant, and Haskell type families using c++ template hackery.

I generate a bunch of class methods like show/eq, and also serialise/deserialise. The autogenerated serialization code calls the tinycbor libary.

Since I rely on tinycbor and my library is in active development, I was thinking of publishing it as a different repo than binary-serialise-cbor.

Recent developments are C++ unit tests generated with QuickCheck, and interoperability with HDF5.

dcoutts commented 6 years ago

We should look more closely at the CDDL stuff

https://tools.ietf.org/html/draft-ietf-cbor-cddl-02#page-22

Shimuuar commented 5 years ago

Yes CDDL could be used to define schema for encoding for haskell values. For example data type

data Foo = Foo
  { foo :: Int
  , bar :: Char
  }
  | Bar String

could be described by following schema:

Foo = [ 0, foo:int, bar:tstr ]
    / [ 1, tstr ]

I think it wouldn't be very hard to generate such schemas using generics (assuming Serialise instance is generic-derived too)

dcoutts commented 5 years ago

CDDL is now a proposed standard https://tools.ietf.org/html/rfc8610