twitchtv / twirp

A simple RPC framework with protobuf service definitions
https://twitchtv.github.io/twirp/docs/intro.html
Apache License 2.0
7.14k stars 327 forks source link

Proposal: Support streaming in the Twirp spec #3

Open jacksontj opened 6 years ago

jacksontj commented 6 years ago

I think this is a great addition for RPC for Go (as well as other languages!). The main thing that I expect will limit adoption is the lack of streaming support. There are mechanisms to do streaming on http/1.x (a great example is the grpc-gateway stuff) where they effectively just use chunked encoding with blobs to do the streaming.

spenczar commented 6 years ago

Yeah, let's get this discussion going. I agree that this is an important feature.

One important constraint: we need to continue to work on HTTP 1.1. I think that means we can't ever support full-duplex, bidirectional streaming, but I'd be very happy to be proven wrong.

Could you (and others) write a bit about the use cases for streaming RPCs? At Twitch, we haven't really needed any, so I'd like to have concrete users to build for to guide the design of the user API. The underlying implementation seems pretty clear, but I want to make sure the generated code feels nice to use.

marwan-at-work commented 6 years ago

My use case is usually file uploads from one backend service to another. I've written a CI/CD where I have multiple containers build different docker images and stream data from all these containers to a central backend that tars all of them and uploads them as a github release asset. Those could be many megabytes and obviously had to be chunk'd and sent out as streams. RPC streams were perfect for my use case : )

tommie commented 6 years ago

I don't have any needs for Twirp, but I do use gRPC streaming.

In all cases so far, I've never need bi-directional on the same RPC. The one place I could think of is if you have a single RPC that sends state deltas and receives state deltas. But in that case, you're really looking for classic long-polling e.g. Websockets, not the RPC style of doing things. You can always invent an identifier on top of the RPC mechanism that would allow concurrency safe pub-sub on two uni-directional streams.

So if the only concern is support for bi-directionality, I'd say start with mutually exclusive directions, WLOG. That also cuts the question of full duplex short. I mean, on the wire, HTTP/1.1 could definitely support full duplex streaming, but I assume the problem is that the XHR API requires a request to be finished before reading the response and is hence practically unusable. (Is it more common for HTTP server APIs to support FD BD streaming? Thinking about the Go HTTP library got me to "yes".)

thelazyfox commented 6 years ago

We should also consider websockets. Client support is pretty widely available these days and we'd get bidirectional streaming support out of the box.

The biggest argument I can think of against is that LB/Proxy support would be poorer which is certainly unfortunate. AWS ALBs, Nginx, and HAProxy all support websockets and I imagine those would be the most commonly used Proxy tools with Twirp.

wora commented 6 years ago

The streaming support can be added using a wrapper message, such as:

// A wrapper for (stream Foo)
message StreamFoo {
  repeated Foo messages = 1;
  Status status = 2;
}

Pros:

Cons:

jacksontj commented 6 years ago

@spenczar thanks for the quick response (and sorry for mine being so slow :) ). I'll try to find some time here in the next couple of weeks to get a proposal (with PoC) together to review here.

spenczar commented 6 years ago

If we could get away with this without using websockets, I'd be happier. Support exists in proxies and stuff, but it can be shaky. You often have to do some configuration. Nobody has to configure their proxy or load balancer to be able to serve HTTP 1 requests, though.

Websockets libraries usually aren't as battle-tested as HTTP libraries, too, so we'd likely be running people into more bugs.

I think we don't need to use websockets to get what we want. I think we can do unidirectional streaming in pure HTTP 1, and say that bidirectional streaming is only available in http/2. I think that's going to be fine for almost everyone's use cases, and will make things simpler for everyone.

I like @wora's proposal. To flesh it out a bit:

Streams are represented as a sequence of messages, terminated by a single "trailer" message. The trailer holds a status code. This trailer has two functions. First, it signals the end of the messages, so the recipient knows the stream is done. Second, it lets the sender send an error, even if things started out well.

For an rpc like this:

rpc UploadFile(stream FileChunk) returns UploadFileResponse

We'd create a helper type:

message FileChunkStream {
  repeated FileChunk messages = 1 [packed=true];
  Trailer trailer = 2;
}

The "Trailer" type can be figured out later - I won't get into it here. It doesn't really matter for the core design.

A sender writes out the serialization of the Stream helper type to the wire, which would either be a HTTP request body or response body. The recipient would need a streaming deserializer for this to work.

A streaming deserializer would work in JSON, because each message is balanced by matching { and } characters, so you know when you've received a complete message.

It would work in Protobuf because repeated messages use length prefixing in front of each message - it looks like <tag><len><encoded_msg><tag><len><encoded_msg>..., where <len> is the size of the encoded_msg in bytes, so a reader can always know how many bytes to chomp.

Most JSON and Protobuf deserializers are not so low-level that they let you emit each message on the fly like this, though. Most would want the whole thing to be loaded into memory. We don't want that, so we would probably need to implement our own deserializer in generated code. It's hard to tell how difficult that would be - we should take a crack at it to get a feel for its difficulty before accepting or declining this particular proposal.

rhysh commented 6 years ago

Using HTTP/1.1 for unidirectional streaming, with bidi being available on HTTP/2 sounds great. The message design / wire protocol sounds good.

Doing streaming processing shouldn't be too hard. For JSON, we might specify that messages are separated by \n. That would let us easily find the boundaries, be very readable on the wire, and for the Go implementation we could use encoding/json.Decoder out of the box (probably decoding to a json.RawMessage before handing off to jsonpb.Unmarshal). The protobuf framing is pretty straightforward too, though I don't know of any currently-open-source code for Go that does it.

I'm interested to hear about the Go API design. Currently the client and server use a single interface, which is a really useful property. Will we be able to keep it when adding streaming APIs?

spenczar commented 6 years ago

Glad it sounds good, @rhysh!

I was wary of using \n when I first thought about your suggestion because I didn't want to have to escape them, but then I learned that \n is forbidden inside JSON strings, so it actually shouldn't be too hard at all. I like that idea.

Could we use ,\n as the separator instead, though? Then the full recorded stream would be valid JSON, which seems like a nice property.

I'll make a separate issue for discussing the Go API design, I have some thoghts there but yes - it's tricky.

jmank88 commented 6 years ago

Regarding streaming JSON, there is an RFC describing JSON text sequences ("application/json-seq"): https://tools.ietf.org/html/rfc7464. Basically, always lead with a RS and end with a LF (sometimes required to ensure no truncation). I've implemented a go library here https://github.com/jmank88/jsonseq, which adapts std encoding/json by default.

achille-roussel commented 6 years ago

What would you think about using an array representation for streaming JSON? Writing [ and ] at the beginning and end, and separating with ,? That way there's no need to support a custom "line-split" format, any JSON decoder can read the stream, those that support streaming JSON arrays can produce items as they arrive, the others would load the entire array in memory but still work.

spenczar commented 6 years ago

Yes, @achille-roussel, that's what I am thinking of too. I think we should send a JSON array which is a sequence of messages, then the trailer.

~(I'm not sure what I was thinking when I suggested ,\n. The spec is quite clear that , is the only legal separator in arrays.)~ (Wrong again! The spec permits this.)

That's an interesting RFC, @jmank88, and it does seem to be designed for this case. But I have a few concerns:

jmank88 commented 6 years ago

The primary advantage of json-seq is recovery without dropping the stream (https://tools.ietf.org/html/rfc7464#section-2.3), but that may not be necessary here.

rhysh commented 6 years ago

Some options for the JSON representation of a message stream of {"m":4}, followed by {"m":1}, and concluding with {"m":5}:

Space-separated

{"k":4} {"k":1} {"k":5}

Newline-separated (\n)

{"k":4}
{"k":1}
{"k":5}

Dense JSON Array

[{"k":4},{"k":1},{"k":5}]

JSON Array with Helpful Newlines

[
{"k":4}
,{"k":1}
,{"k":5}
]

Of these, I most prefer the ones with newlines. They're easy to work with on the command line (and manual debugging is important for the JSON protocol). Client and server implementations can safely split on the newlines, process a few bytes on either end of the line, and pass to a non-stream-aware JSON decoder.

But the difference between "Newline-separated" and "JSON Array with Helpful Newlines" is whether the wire protocol for a unary RPC is different from a streaming RPC with a single value. We'll need to consider this for the protobuf wire format as well.

I think that gRPC's wire format doesn't differentiate between streaming and unary RPCs. That design implies more complexity in some places, but allows simplification in others.

achille-roussel commented 6 years ago

+1 on having one item per line when JSON is used, it doesn't add much to the payload size and makes it much easier to debug.

wora commented 6 years ago

The entire response must be a valid JSON object, including the messages and the trailer. Otherwise, it won't be compatible with OpenAPI, and create all kind of compatibility issues. Incremental parser for JSON should be a trivial work, at least outside the browser. I don't think we need to worry too much about the complexity, since it is one time cost to write such a parser.

spenczar commented 6 years ago

I like your "JSON Array with Helpful Newlines" ("JAHN") proposal a lot, @rhysh. We would need one more constraint on serialization, which is no newlines within a serialized object. I want to forbid this:

[
{
  "k": 4,
}
,{"k": 1},
,{"k": 5}
]

@wora, I don't totally agree that we should discount the difficulty of making a deserializer. Simplicity in making clients is important for Twirp's success.

That said, I think it's okay for Streaming to be a slightly "advanced" feature. Client generators that can only handle deserializing streams APIs by holding everything in memory (like @achille-roussel described) will still work, they'll just be less good than well-made deserializers.

wora commented 6 years ago

If you only send the array, where do you put the trailer?

Adding the object wrapper does not add any complexity to the parser. The output looks like this:

{"messages":[
]}

It still uses a fixed first line with a few extra bytes.

spenczar commented 6 years ago

Oh yes, I was thinking of a mixed-type array:

[
{"k":1}
,{"k":2}
,{"k":3}
,{"status": "ok"}
]

Mixed-type arrays can be obnoxious to deserialize, though; I agree that wrapping is probably better, so it becomes

{"messages":[
{"k":1}
,{"k":2}
,{"k":3}
],"trailer": {"status": "ok"}
}
rhysh commented 6 years ago

@spenczar your last example LGTM

Thanks for clarifying that we're not relying on HTTP Trailers; those have spotty support in HTTP/1.1 software which would roughly translate into Twirp depending on HTTP/2.

Having the top level be a JSON message rather than an array is probably good for JavaScript too .. I think there's an attack involving overriding the constructor for Array, which allows intercepting XHR responses that use an array at the top level.

What are your thoughts on how unary RPCs look compared with streams that contain a single element? As a parallel, in protobuf messages we're allowed to change between "repeated" and non-repeated ("optional" in proto2) fields without breaking wire compatibility.

wora commented 6 years ago

Changing between repeated and non-repeated is a breaking change for JSON. That is how JSON works, array of ints is different from int.

spenczar commented 6 years ago

@rhysh I don't mind them looking different. Unary RPCs are different than streams. We can handle that difference, and so can our users - in fact, I think they'd be happier with this. It's much simpler and clearer.

@wora, I think Rhys is suggesting using the same wrapper for unary RPCs, which would be compatible. I think that would be a mistake, though.

rhysh commented 6 years ago

Right, single-field -> repeated-field is a breaking change for JSON. Is it worthwhile to make unary RPC -> streaming RPC be a non-breaking change for JSON?

Is it worthwhile to make unary RPC -> streaming RPC a non-breaking change for protobuf?

rhysh commented 6 years ago

OK, thanks Spencer.

wora commented 6 years ago

unary vs streaming is a breaking change in practice, just like repeated vs non-repeated. Let's not pretend otherwise. Even it doesn't break wire, it would certainly break client. You can return a million items over a stream, and blow out memory of naive clients all over the places.

achille-roussel commented 6 years ago

About the {"message":[...],"trailer":{...}} format, it seems like the "message" field has to come before the "trailer", that's kind of unintuitive with JSON where the the order of keys usually doesn't matter in an object. Would/Should it be valid for a client to send the "trailer" field first? It would be valid JSON but invalid in the context of Twirp?

Progressively decoding a JSON object is also a bit more complex than just decoding a JSON array (read [, decode a value, read ,, repeat until ] + skip white spaces).

I wonder if there's really a need for trailing values:

For systems that do need trailer support, it can easily be built at the application level with a definition like this:

service Service {
  rpc MyRPC(stream Value) returns (Result)
}

message Value {
  ...
  Trailer trailer; // the client only sets this value on the last item
}

this has zero overhead since the trailer is omitted on all values but the last, and on the last value all other fields are absent as well.

surajbarkale commented 6 years ago

How about supporting EventSource for streaming?

mocax commented 6 years ago

I think we should have identical semantics between JSON and protobuf. This is necessary to maintain a stable and healthy wire protocol in the long term, e.g. next 10+ years.

Adding a trailer message to every request message is a poor experience. It forces developers to check for trailer for each message, and they won't enjoy it. We should see how complicated it is to do the JSON incremental parsing. It is unlikely to be a major issue in the long term.

The complexity of Twirp protocol is much much less than the EventSource. It is better to keep things as simple as possible, which is the selling point of Twirp.

achille-roussel commented 6 years ago

Just to be clear, I'm not suggesting to force the use of a trailer field in order to use streaming. I'm saying that most applications out there don't need trailers, and for those that do, they can still implement it within their API.

I'm having a hard time seeing how trailers is an important feature of an RPC system, gRPC's popularity being a good example of it.

wora commented 6 years ago

The proposed the trailer support requires a few lines of code. The question is whether there is enough value to provide it at little cost? We can omit it for now and add it later.

Without the trailer, how do you report error from server to client? We have to introduce another design pattern to handle it, which has non-zero cost either. If we have to do something, it is better to use a known design pattern than an unknown one. Giving the problem to developers will create significant inconsistency across APIs.

spenczar commented 6 years ago

@achille-roussel:

it seems like the "message" field has to come before the "trailer", that's kind of unintuitive with JSON where the the order of keys usually doesn't matter in an object. Would/Should it be valid for a client to send the "trailer" field first? It would be valid JSON but invalid in the context of Twirp?

Right, this is unfortunate, but I don't see a way around it: we require "message" to appear before "trailer." The good news is we only place the burden on serializers, not on deserializers.

Is there an alternative design available to us here? You suggested putting it into the actual message type, but then it's not accessible to the generated code, so it pushes a bunch of complexity onto the user. I think I'd rather use our standard error codes in the trailer.

I wonder if there's really a need for trailing values:

Trailers are needed for stream senders to communicate an error to recipients after a stream has started. There are many, many reasons you might hit an error mid-stream: an error reading a file you are uploading, or a timeout between operations, or the server needs to restart, etc etc. I think support for mid-stream errors is a must-have for our design.

gRPC's streaming doesn't support it for example

gRPC uses trailers for almost everything, actually, especially for streaming - it uses http/2 Trailers for this purpose. It leans quite heavily on them.


@surajbarkale, I think EventSource's biggest problem is that it is only server-to-client, it doesn't allow client-to-server streaming.

It also imposes lots of weird restrictions on Twirp without enough benefit. For example, it mandates that clients must asynchronously reconnect on any 500-504 error. That's a lot of complexity that I don't think is worthwhile. It also mandates that the Content-Type must be "text/event-stream", which means we would need to negotiate the encoding separately on our own, which would be kind of nasty.

achille-roussel commented 6 years ago

I'm mostly worried about the implementation on languages that don't have access to a good incremental serializer, the implementation will be a hacky concatenation of the {"message": prefix, append of the array of values, append of the ,"trailer": and eventually the trailer value + the closing }. It's not the end of the world but seems like a design that encourages questionable implementations.

Seems like there won't be a perfect solution here, it'll just have to work ;)

Thanks for the reference of how gRPC handles it.

spenczar commented 6 years ago

Right, I agree that it imposes a lot on serializers. I think streaming-from-a-client will end up being considered an advanced feature that only advanced clients support.

Like I said, if there's an alternative, I'd love it. This design has these nice features:

It has these downsides:

It would be great if we could get all the features without the downsides, but I don't think that's likely.

achille-roussel commented 6 years ago

Do the separators really have to be \n,? I feel like it could be a "best practice" guideline but the decoder could very well support the full JSON spec here.

Basically the separator is (optional blanks) + (read ',') + (optional blanks).

That would widen support for platforms where a streaming serializer is available but doesn't give control over the layout. I feel like it would be better to allow implementations to use a well tested serializer than having to implement their own due to a format restriction like this one.

spenczar commented 6 years ago

The separators don't need to be \n,, but newlines make things easier on deserializers because you can split input on line breaks, like this:

for line in response:
    if line starts with "]":
        break and read trailer
    if line starts with ",":
        line = trim leading "," from line
    msg = parse line as JSON
    emit msg

which is a very simple streaming deserializer!

If the separator is ,, then you need to actually lex incrementally to find the right matching } that signals end of a message. Lexing JSON isn't horrible, but it's a lot harder than just splitting on newlines.

I guess the tradeoffs here are:

It's not obvious to me. I'd like to try making a JSON lexer by hand to get a feel for how much \n, is gaining us in the deserializer.

achille-roussel commented 6 years ago

I see where you're coming from.

I was picturing an approach a bit different for deserialization, assuming there's some kind of buffered IO stream implementation in all popular programming language, something in the lines of:

decode_token(stream) # this must be '['
while True:
    yield decode_value(stream) # decodes the next JSON value in the stream
    if decode_token(stream) == ']': # otherwise it must be ','
       break

In Go your decode_value call is something like this:

d := json.NewDecoder(r)
d.Decode(&v)
r = io.MultiReader(d.Buffered(), r)

As long as you can deserialize a single value from a JSON stream you can construct something similar (plus some error handling for malformed streams).

Going the route of supporting a subset of JSON by enforcing the separator could cause compatibility issues if the constraint was ever relaxed in the future. Newer clients produce content that older servers cannot consume. We can do some damage control with versioning, but why not support the full JSON spec in the first place to fully discard this class of problems?

spenczar commented 6 years ago

As long as you can deserialize a single value from a JSON stream you can construct something similar.

This is really the key. How widely available is that feature?

Go supports it with json.Decoder.Decode, and Python supports it with json.JSONDecoder.raw_decode. I think Java's org.json supports it too.

Ruby's standard library doesn't support it, and JavaScript doesn't either, but there are popular third party libraries that do, and those languages tend to lean heavily on third-party libraries.

I think you've got me convinced, @achille-roussel, that we shouldn't strictly require \n, as a separator.

We still are requiring a particular ordering for fields, though. One way around that is to use a mixed-type array at the top level. Its first element would be the list of messages, and its second element would be the trailer:

[
    [{"message": 1}, {"message": 2}],
    {"status": "ok"}
]

This is no longer representable in protobuf, since protobuf doesn't support mixed-type lists nor top-level lists, but it is totally compliant with the JSON spec, which might be worth it, overall. Hm.

spenczar commented 6 years ago

I should point out that protobuf also says fields can appear in any order, it just encourages serializers to write fields sequentially by field number. So, the same problem exists (theoretically) for protobuf streams. In practice, all serializers emit fields in increasing order by tag number, though.

It's a way bigger deal for JSON.

mocax commented 6 years ago

I think we should model the stream as a proto message, and serialize it into JSON without modification.

There is no need to worry about the field ordering, since we have full control over the writer side for both JSON and proto. Anyone who attempt to parse the stream as a normal JSON value will be killed by out-of-memory error anyway. The reader side of the stream must use an incremental parser, so the ordering doesn't matter at all. I don't think we need to make the schema more complicated.

spenczar commented 6 years ago

@mocax, you write:

There is no need to worry about the field ordering, since we have full control over the writer side for both JSON and proto.

I think we still do need to worry about it, as it indicates a break from the normal JSON spec, so we need to support it in our spec. All "MUST"-type statements in the spec have high cost - this has been a core design principle for Twirp, and it's been really really effective.

You're right of course that we have full control, in the sense that the Twirp repository includes generators that we maintain. But it's equally important to think about how easy it is to write clients in other languages. Clients can be stream-senders, so they'd need to get this detail right.

I think we should model the stream as a proto message, and serialize it into JSON without modification.

Could you explain the benefits of this? I think it would be good to be explicit.

mocax commented 6 years ago

I don't think this violate JSON spec. JSON spec says the serialization process does not promise the order of the object entries. But the spec does not prohibit Twirp from promise the serialization order. It is like JSON number is a double, but Twirp can promise the error code is always 0-255, and the client can reject large value based on the promise.

Both JSON and proto use "last one wins" rule, so the wire format must preserve the order. A proto message can have duplicated value for the same field, and it always works. It is the server and the client promising the ordering, not the JSON or proto spec.

A wire protocol will be implemented many times by many people. By treating the stream as a special message, it is much easier to explain to developers. The wrapper types have the same design principle. In another word, a special semantics in proto is a special proto message, instead of a magic documentation sitting somewhere else.

spenczar commented 6 years ago

@mocax, thanks for discussing this with me in detail! I think we're getting somewhere and talking it out is helping the design.

But the spec does not prohibit Twirp from promise the serialization order.

Right - we would need to say "Twirp's stream serialization under Content-Type: application/json is JSON, with an additional restriction:", instead of "Twirp's stream serialization under Content-Type: application/json is JSON."

That's an important difference, I think, because it means you can't use an off-the-shelf JSON serializer to produce a valid stream - you need to manually write {"messages":[, then emit messages, and so on.

That's not too terrible, but we shouldn't pretend its nothing.

It is like JSON number is a double, but Twirp can promise the error code is always 0-255, and the client can reject large value based on the promise.

Indeed, it's a lot like that! We would not want to make such a promise for that reason - it puts more logic in the client.

Both JSON and proto use "last one wins" rule, so the wire format must preserve the order.

JSON does not have this rule, no. RFC 7159 says:

   ... When the names within an object are not
   unique, the behavior of software that receives such an object is
   unpredictable.  Many implementations report the last name/value pair
   only.  Other implementations report an error or fail to parse the
   object, and some implementations report all of the name/value pairs,
   including duplicates.

In another word, a special semantics in proto is a special proto message, instead of a magic documentation sitting somewhere else.

Right, I agree. That's why it would be nice to find a design where order doesn't matter, so that we don't need to document it. I don't think that's possible, but I do think we should try.

mocax commented 6 years ago

We need to make sure the alternative design also has great usability and ease of implementation. By the end of the day, the code complexity is one incremental parser in each language. It is a very small engineering cost. For comparison, few people care about proto3 encoding complexity, since it is one time effort.

Only array can ensure ordering, but most REST tools cannot handle arrays as top-level values. To ensure trailer is after messages, we have to use mixed type array, but it is awkward to explain and use, plus proto3 can not have mixed type array. So we have to use different hacks.

The message wrapper is a battle-tested solution, see "StreamBody" at https://cloud.google.com/apis/docs/http. That should be good enough.

Merovius commented 5 years ago

To chime in here as a stranger:

First, my use-case for streaming is bi-directional synchronization. Essentially the delta-encoding mentioned above. Currently, the protocol I'm using is:


In regards to the protocol: I know that it may seem complicated, but IMO HTTP multipart bodies are far simpler than what's discussed here. Instead of arguing about how to violate the spec of which encoding scheme and requiring to write your own de- and encoders to manage field orders and merging and… it seems simpler to say a streaming RPC sends/responds with a multi-part body, where each part is one request/response message. Yes, implementing multipart-HTTP itself could be complicated - but I'd expect most HTTP libraries, proxies and the like to already be aware of that (i.e. it reuses more existing efforts). Go has multipart support in the stdlib, which together with http.Flusher could be used to implement streaming pretty simply AFAICT.


Just my 2¢ :) Feel free to ignore me :)

danmux commented 5 years ago

I think it would end up being simpler supporting only http/2 - it is likely that the client and server are twirp and so it will in general not be a huge issue I would have thought.

marioizquierdo commented 4 years ago

Note: the streaming API is no longer tied to Twirp protocol v6, which has been archived. This doesn't mean that streaming will never happen on Twirp, but it means that there are no short-term plans to release it as v6. This issue may continue open and looking for alternatives.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label / comment or this will be closed in 5 days

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label / comment or this will be closed in 5 days

tv42 commented 3 years ago

Bad stalebot.

casimcdaniels commented 3 years ago

Bad stalebot.

Worst github bot ever created