google / proto-lens

API for protocol buffers using modern Haskell language and library patterns.
https://google.github.io/proto-lens
BSD 3-Clause "New" or "Revised" License
460 stars 110 forks source link

Thoughts about writing a .proto generator from haskell code? #368

Open martyall opened 4 years ago

martyall commented 4 years ago

Are there any thoughts about being able to generate .proto files from haskell types, ideally using generics? This is currently something we're interested in order to reduce some serious boilerplate, especially during prototyping phases.

This is currently something that proto3-suite is capable of doing. The problem with this library (at least at the moment) is that they do not use protoc but instead have their own custom parser, which does not support all of the specs of the language. So for example in our case, we use proto-lens for the protobuf files that we don't control, and proto3-suite for everything else, which is kind of annoying.

I wouldn't expect this generator to be able to handle all features of the protobuf3 spec, but basically there would be an implementation for types deriving Generic

judah commented 4 years ago

Thanks for bringing this up; I wasn't aware that proto3-suite supported that mode. API-wise, this doesn't seem very objectionable because it only affects manually-written Message instances. I imagine it could also make writing some unit tests simpler.

Unfortunately, my guess is it would be quite involved to implement and maintain this feature, so I'm not convinced it's feasible in practice without significantly refactoring proto-lens. The Message class has several methods (used for reflection as well as encoding/decoding): http://hackage.haskell.org/package/proto-lens-0.6.0.0/docs/Data-ProtoLens-Message.html#t:Message

Much of our logic around encoding/decoding is currently implemented in the codegen as part of proto-lens-protoc. The proto-lens library itself only contains low-level knowledge like how to encode/decode integers and strings. I'm not sure how we would avoid duplicating the higher-level logic, which is pretty undesirable. And currently we rely on the deep codegen to get good performance out of GHC.