mfp / extprot

extprot: extensible binary protocols for cross-language communication and long-term serialization
Other
209 stars 9 forks source link

message to message alias #27

Open ygrek opened 7 years ago

ygrek commented 7 years ago

It appears extprot accepts message alias to message, but generates bad code. This possibility is not documented afaik, not sure if it should be fixed or forbidden, I stumbled on this accidentaly. Consider :

$ cat a.proto 
message m1 = { a : int }
message m2 = M1.m1
$ ./compiler/extprotc a.proto -o /dev/stdout   |grep M1
module M1 =
        [ ("M1.a", (Extprot.Pretty_print.pp_field (fun t -> t.a) Extprot.Pretty_print.pp_int)) ] pp;;
    type _m2 = M1.M1.m1;;
    type m2 = M1.M1.m1;;
    let m2_default = ref (fun () -> !M1.M1.m1_default ());;
    let pp_m2 pp = M1.M1.pp_m1 pp;;
    let read_m2 s = M1.M1.read_m1 s;;
    let io_read_m2 s = M1.M1.io_read_m1 s;;
    let write_m2 b msg = M1.M1.write_m1 b msg;;
mfp commented 7 years ago

On Sat, Jul 08, 2017 at 01:59:21PM -0700, ygrek wrote:

It appears extprot accepts message alias to message, but generates bad code. This possibility is not documented afaik, not sure if it should be fixed or forbidden, I stumbled on this accidentaly.

Looks like an accidental name capture/clash (akin to lack of hygiene in a macro) that could be detected and forbidden. Would have to ponder whether this is affected by the "include" functionality you implemented.

-- Mauricio Fernández

ygrek commented 7 years ago

I am confused because there is Message_alias in Ptypes.message_expr which is only produced by <UIDENT>+.<LIDENT> rule, but then generates this slightly broken code. And there is nothing about aliases in doc, but there is some code to handle them. I would expect a syntax of message m1 = xxx message m2 = m1 to work, but this currently only works when m1 is a type. This is not a big problem by itself, because can be workarounded with type t1 = xxx message m1 = t1 message m2 = t1, it is just that types in code make me think it was intended to be another way.

mfp commented 7 years ago

On Sun, Jul 09, 2017 at 12:13:14AM +0000, ygrek wrote:

I am confused because there is Message_alias in Ptypes.message_expr which is only produced by <UIDENT>+.<LIDENT> rule, but then generates this slightly broken code. And there is nothing about aliases in doc, but there is some code to handle them. I would expect a syntax of message m1 = xxx message m2 = m1 to work, but this currently only works when m1 is a type. This is not a big problem by itself, because can be workarounded with type t1 = xxx message m1 = t1 message m2 = t1, it is just that types in code make me think it was intended to be another way.

I'd forgotten about the message alias feature (in my defense, I wrote it about 5 years ago :). It is intended to work with external messages (defined in other .proto files), allowing to (1) refer to the message using the alias instead of the full path and (2) define a different default value for the message (defaulting to the one given in the external proto). To be honest, I'm not sure whether I've ever used that functionality.

In other words, in the example you gave

message m1 = { a : int }
message m2 = M1.m1

M1.m1 does not refer to message m1, but to m1 defined in the M1 protocol (m1.proto), so the corresponding OCaml module path would be M1.M1. It is because of an unfortunate accident in the way the OCaml code is generated that we end up referring to the M1 module corresponding to the m1 message defined just above (this is the moral equivalent of the lack of hygiene I mentioned before). If there were some way to refer to the "toplevel namespace", we'd use it in the code generated for m2 (imagine a '::' scope resolution operator akin to C++, we'd do ::M1.M1.xxx to avoid referring to the M1 in the same module), but alas there isn't, so we end up with the accidental (module) name capture, and the best we can aim for is to detect that situation and error on it.

As for record types, they came to be as a consequence of the early design choice that messages would be the only valid "unit of serialization" (iow. you cannot serialize e.g. a primitive value or sum type and claim it's a valid extprot protocol, only messages are allowed, as a way to enforce extensibility), and that they would be monomorphic. It happened to me that I had the equivalent of the OCaml types

type 'a somerec = { ....; foo : 'a } type message1 = int somerec type message2 = foobar somerec ....

and creating repetitive, nearly identical message definitions manually quickly became unsightly, so I introduced (polymorphic) record types, subject to the same beta reduction as other types, allowing to factorize the commonalities in the .proto:

type somerec 'a = { ....; foo : 'a } message message1 = somerec message message2 = somerec

-- Mauricio Fernández