phenopackets / phenopacket-format

26 stars 10 forks source link

Is generated protobuf schema useful? #73

Open heuermh opened 7 years ago

heuermh commented 7 years ago

Sorry, this is more of a question than an issue.

In commit 86978bebc54bcc201d2fff21eac651fdda268520 and https://github.com/phenopackets/phenopacket-reference-implementation/commit/fefda802cb5cf6b462ab987534966df6b3bca5a7 I see that a protobuf schema is generated reflectively from the implementation API classes.

As I understand it the typical use case is to generate java classes from the protobuf schema. Those generated classes contain the functionality for reading from and writing to protobuf messages. Since those generated classes occupy the same packages as the implementation API classes, there isn't any way to go from one to the other. I.e. there isn't a way to write protobuf messages from the implementation API classes and no way to instantiate them by reading protobuf messages.

Or perhaps I'm missing something?

heuermh commented 7 years ago

For example, here is a project that generates java code from the protobuf schema in this repository (with minor revisions) with both the Google protobuf compiler and https://github.com/square/wire.

https://github.com/heuermh/phenopacket-protobuf

cmungall commented 7 years ago

Good questions.

My understanding is that the java user would have two options for phenopackets-protobuf

  1. Use reference POJOs plus jackson - see https://github.com/FasterXML/jackson-dataformats-binary/tree/master/protobuf
  2. Ignore the reference POJOs and work purely with generated source

It should be noted that we have not tried 1 yet!

In theory it should also be possible to mix and match, and have a translation layer between the two.

One use case for the protobuf right now is to provide something people living purely in an IDL world something they can use, whilst retaining semantic compatibility with phenopackets (even if requiring extra plumbing for full interoperability).

One possibility for the future is that we bootstrap away the reference POJOs, making the proto IDL be the canonical source. This would have some advantages (e.g. compatibility with GA4GH) and some disadvantages too, more investigation required.