ocapn / syrup

Syrup is a simple binary way of preserving data on the wire, with perhaps a few extra calories.
Apache License 2.0
20 stars 4 forks source link

Preserves' Embed/Unquote (currently: "Pointer") type #4

Open cwebber opened 3 years ago

cwebber commented 3 years ago

Preserves has another core type that @tonyg added since we had last spoken, and which had confused me quite a bit: the Pointer type. I couldn't understand (assuming this meant some sort of unsafe memory pointer) why such a type would exist in Preserves, but @tonyg explained it was fairly misnamed and was considering a rename to Embed.

Really, what it is: a kind of quasiquote-style unquote. Except there's no explicit corresponding "quote" operator to correspond with the "unquote".

Consider the following message send from Alice to Bob, going over CapTP:

;; from alice, to bob
(<- bob "foo" local-mouse remote-alien (record* 'beep "boop"))

Current Goblins CapTP conversion looks like so:

<op:deliver-only
   <desc:export 42> ; to-object
   ["foo",          ; arguments
    <desc:import 24>,
    <desc:export 33>,
    <user-record beep ["boop"]>],
   {}>                                 ; keyword arguments

The <beep "boop"> record is transformed into the much more verbose <user-record beep ["boop"]>... ie, we escape all records that may have existed as data. But @tonyg suggests using the Embed type (prefixed with #! in Preserves textual syntax) like so instead:

<op:deliver-only
   <desc:export 42> ; to-object
   ["foo",          ; arguments
    #!<desc:import 24>,
    #!<desc:export 33>,
    <record beep "boop">],
   {}>                                 ; keyword arguments

This is the inverse... we now no longer escape the user-supplied record, we escape the protocol-related record by using the Embed type in this way. Thus the argument list is interpreted as a kind of "quasiquote". This also explains why the <desc:export 42> (corresponding to the to-object this will be delivered to) does not need a corresponding pointer... it's not in a context where any conceptual "quasiquoting" is happening.

It's an interesting idea. Syntactically in Syrup I think it could be encoded quite simply: the ! character would suffice, demanding/expecting a following piece of preserves-related data that it would then "wrap".

I think this is a neat/clean idea. What do others think?

zenhack commented 3 years ago

I am a fan. In haskell-preserves, the value type is parametrized by the type of pointers, so you have:

data Value p
    = Atom !Atom
    | Compound (Compound p)
    | Pointer p

At runtime, p can be any type, including types the encoders don't understand, but the encoders expect specific types; right now the syrup encoder (the only one so far) has:

encodeValue :: Value Void -> BB.Builder

Where Void is an uninhabited type, and thus Value Void has no pointers. Even without supporting pointers on the wire, this is useful in a typed language as it gives me a place to stick capabilities independent of how they are encoded in a call/return message. a codec that supported pointers could take Fix Value (the fixpoint of Value) instead, to demand that pointers be themselves encodable values. The ocap layer could demand Value Capabilty, and then traverse the value replacing capabilities with the appropriate wire encoding before sending the message.

I've separately been thinking about refactoring haskell-capnp to do something similar with the capability pointers embedded in messages -- right now it has Client as a fixed type, but it would sometimes be nice to treat them as an arbitrary out-of-band reference. It would also remove a cyclic dependency and allow the serialization layer to not know anything about the ocap layer.

It seems like this is perhaps less essential in a dynamic language, since there's no type system telling you you can't just put some arbitrary junk inside any container, but it strikes me as really useful.

This was what I understood the intended use of Pointer to be when I read the spec, but maybe @tonyg can say whether I got it right.

cwebber commented 3 years ago

This was what I understood the intended use of Pointer to be when I read the spec, but maybe @tonyg can say whether I got it right.

I suspect so also given that I think that Pointer was inspired by Cap'N Proto somehow, and how it got the name, even though it's not an exact match to what one might expect by that name.

erights commented 3 years ago

I am confused by this thread. Are we trying to define a serialization for CapTP that is binary compatible with some already deployed serialization system? Or are we trying to learn from, and be not-gratuitously different from, some serialization systems we admire, whether deployed or not? If the first, then we've already made too many changes and will need to reverse some. If the second, then we are too resistant to changes that would add value. I am genuinely confused.

zenhack commented 3 years ago

Maybe you're aware of some of this but: Syrup's abstract data model is that of Preserves: https://preserves.gitlab.io/preserves/

...though is a different (and simpler) concrete binary syntax.

This particular issue is about adding an extension to that model to syrup, which I guess @tonyg added some time after @cwebber put the initial syrup stuff together. So this issue is discussing bringing it back in-line with the abstract model. It also happens to be a feature that is particularly useful in CapTP.

My impression is that preserves is not really "widely deployed," but perhaps I'm mistaken.

It seems like this is in a weird grey area re: adopting an existing thing as-is vs. designing your own, since @cwebber appears to be involved in what is/was a pre-existing project, that we are now exploring using?


We've had some loose conversations re: the abstract data model of CapTP, but my inclination is to pick a relatively off-the-shelf serialization format (perhaps, preserves, perhaps CBOR or msgpack, maybe capnp), make sure we know how to model capabilities in that format, and then "call it a day" as far as the protocol is concerned -- using available mechanisms in the format to encode language-specific constructs as needed.

Part of this inclination is philosophical - besides embedded capabilities, I don't think there's anything remotely special about our serialization needs, so if we're designing some custom thing, that smells like gratuitous NIH to me. I like the idea of just adopting the preserves data model wholesale as-is. I'm open to using syrup as the concrete encoding since preserves isn't super widely implemented anyway, and syrup is trivial to implement -- though just using the existing binary encoding of preserves also seems like a sensible idea; it would be a bit more compact, not that much more complex to implement, and would avoid extra format fragmentation.

cwebber commented 3 years ago

Note that we're in ocapn/syrup rather than ocapn/ocapn right now. The syrup repository here is about syrup moreso than about captp, but people seemed on board with coordinating syrup here, so that's why I moved the repo here. My example about this extension was partly because this is the example @tonyg and I discussed.

The decision of "how to marshall things, and is this useful for marshalling" applies to captp, but the particular abstract types, at the captp abstractions layer and above its particular representation, probably aren't affected much at all by this particular issue. However this might be useful in terms of the way we marshall to syrup. But it isn't required, evidenced by Goblins' CapTP already not using this...

(At any rate, I suspect explaining what this feature means is probably more easily done over a call than with this issue anyway. I personally found it fairly strange until @tonyg explained it to me on a call.)