Open bterlson opened 1 year ago
Question raised by @witemple-msft is whether this should go into the HTTP library or another library like @typespec/sse
. An argument for would be that SSE is defined by the HTML spec rather than HTTP. An argument against is that it's a lot of extra complexity, but maybe that's ok because SSE is a rarely used thing anyway?
In the OpenAI SSE API, the terminal SSE payload is the literal "[DONE]"
which is not application/json
. So I think we need another decorator to capture that terminal event, e.g. sseTerminateWith(T: string)
?
In the OpenAI SSE API, the terminal SSE payload is the literal
"[DONE]"
which is notapplication/json
. So I think we need another decorator to capture that terminal event, e.g.sseTerminateWith(T: string)
?
This seems to complicate things significantly if we cannot even know what the serialization format of a given logical event is. Is with no discriminant like a content-type header. So this anonymous event in OpenAI sometimes being JSON and sometimes not is really a problem for emitter implementation.
In spec space, I don't really think we need a special "terminates" decorator and think we could probably just wrap it up into the union for event types:
union OpenAICompletionEvent {
@sseContentType("text/plain")
string;
@sseContentType("application/json")
ChatDelta
}
It's just going to be really annoying to actually implement this in an abstract way.
I agree the OpenAI API feels complicated and tbh I am not sure what was their motivation behind this design decision. Perhaps sending [DONE]
makes it clear that the stream actually ended and the client is safe to assume so? We actually have a bug instance in one of those APIs where [DONE]
is not sent eventually even though the stream closed. And the stream feels it has abruptly ended.
Generally speaking, the content-type value of text/event-stream
lends to the fact that the data payload is an arbitrary textual value and having that terminal decorator designate a particular value as terminal should be enough in the OpenAI case.
Your proposal trades precision for simpler spec but I am not sure how an implementation should use it? If a payload is JSON, parse it and return it but if it is a string, return it as is? I don't think this is what we would want, at least in the OpenAI case.
Question raised by @witemple-msft is whether this should go into the HTTP library or another library like
@typespec/sse
. An argument for would be that SSE is defined by the HTML spec rather than HTTP. An argument against is that it's a lot of extra complexity, but maybe that's ok because SSE is a rarely used thing anyway?
I see both as valid arguments not to add it to the HTTP library. This is a pretty niche feature, isn't part of the HTTP spec, and we may want to break/version it without impacts to the more fundamental HTTP library.
@deyaaeldeen yes, I think the only logical interpretation of my proposal is that the emitters would have to write logic to test if the string value returned by the SSE endpoint satisfies the given sseContentType by checking them in order, so maybe this is an even more accurate representation of the OpenAI API:
union OpenAICompletionEvent {
@sseContentType("text/plain")
"[DONE]";
@sseContentType("application/json")
ChatDelta;
}
I think the implication here in TypeScript is that you have two event types: "[DONE]"
and ChatDelta
and the stream logic is that you check if the payload is the literal string "[DONE]"
and then if not tries to parse it as JSON. If it fails then you have an error of some kind, and you end up with a natural representation of this stream that is returned to the customer, which is an AsyncIterable<"[DONE]" | ChatDelta>
.
This isn't so much about simplifying the specification, but avoiding having must-understand metadata items that emitters have to check for in order to implement SSE correctly in general, especially one that I suspect only exists because of a quirk in OpenAI's implementation. Here we just say that [DONE]
with a MIME type of text/plain
is a possible event. We lose the information that OpenAI always produces it at the end (redundantly, since the fact that the stream is done is implied by the connection being closed), but I don't think we really want to open the door to encoding information about how events are sequenced in APIs. That is some kind of information about the protocol embedded within the SSE stream and not about the SSE stream itself.
Don't the LRO stuff in typespec encode information about when an operation terminates? It also feels annoying to have to deal with AsyncIterable<"[DONE]" | ChatDelta>
when we know that [DONE]
is nothing but a termination marker. Passing that up to the client doesn't seem to provide any real value. It is like putting the standard HTTP status code in the output model when we know for sure at this point the request succeeded.
I am not necessarily opposed to be less precise, I just wonder if it is worth it. We need to differentiate between what can be expressed and what is considered a best practice. If something is not a best practice, we can just lint against it.
I'm not sure I like the proposed solution above because I think we want sseEventContentType
to only be applied to a model in the case where event types are not used, and otherwise to the event type union variants, where they are discriminated by the event type. Things feel too complex otherwise. Multiple content types for the same event payload just feels bad, and in the OpenAI case is due only to a weird quirk. I am not motivated to support it when there are reasonable workarounds.
Another issue is that allowing such specs invites a lot of ambiguity - it's super easy to specify events that cannot be discriminated by examining the payload - is data: [false]
intended to be a string "[false]"
or an array containing a boolean? The best we can do is check validity against the content type in the order they are declared, but again that feels not so good to me.
As to the reasonable workarounds, @deyaaeldeen suggested maybe we can silently ignore events whose payload cannot be understood with the given content type, allowing the spec to simply say that the event content type is application/json
and the terminal marker is dropped on the floor. This feels borderline reasonable to me. Thoughts?
I worry a little bit that approach will make it impossible to distinguish intended spec violations from unintended spec violations. If the client sends something that doesn't parse because of some server memory error or whatever (however unlikely that may be), we'll just drop it rather than make the error observable.
Another issue is that allowing such specs invites a lot of ambiguity - it's super easy to specify events that cannot be discriminated by examining the payload - is data: [false] intended to be a string "[false]" or an array containing a boolean? The best we can do is check validity against the content type in the order they are declared, but again that feels not so good to me.
This is a much more general problem. If any API returns heterogenous data without a discriminant, it's a problem for emitters to figure out which one was actually returned without constructing complex runtime tests for type identity. And even with such tests, there are ambiguities. For example say an API returns some JSON model that looks like this:
model Foo {
data: Bar | Baz;
}
model Bar {
a?: string;
b: safeint;
}
model Baz {
a: string;
b?: safeint;
}
Well, now I don't know how to parse data that has both a
and b
. It could be either, so we have to just pick one when deserializing at runtime or reject this case outright. For TypeScript specifically, we don't care, because the weakly-typed JSON object form is our native data format, and we just say it satisfies Bar | Baz
, but in languages where you have to choose a representation of data where there is real type identity, that isn't possible. You have to actually construct the data by picking a representation, and we have to do that in some order or just reject APIs with undiscriminated ambiguities.
I've read the SSE spec now and while I wish the openai done message was JSON, its presence is more or less required for finite streams. According to the SSE spec:
Clients will reconnect if the connection is closed; a client can be told to stop reconnecting using the HTTP 204 No Content response code.
In other words, to prevent a client from attempting a reconnect only to be told to go away, the server must send some kind of terminal message. As such, we need to support that. Now ideally we would support this by requiring a dedicated event type, which would allow something like:
@post op createCompletionStream(): SSEStream<Events>;
model Completion {
text: string;
}
@sseEventTypes union Events {
completion: Completion
@sseTerminalEvent
done: void;
}
But of course this doesn't help with the OpenAI case. If we take Will's suggestion that allows multiple content types for a given event type (by allowing @sseEventContentType on union variants for a particular event type), then we can apply @sseTerminalEvent there as well:
@post op createCompletionStream(): SSEStream<Event>;
union Event {
@sseContentType("text/plain")
@sseTerminalEvent
done: "[done]";
completion: Completion; // default application/json I suppose
}
Perhaps this is the best option. The issue I have with @deyaaeldeen's sseTerminateWith
is that it feels too special-casey for this one particular kind of terminal message. You could imagine doing something like this which doesn't feel super awful:
@post op createCompletionStream(): SSEStream<Event>;
union Event {
@sseTerminalEvent
done: { done: true }
completion: { done: false, value: Completion };
}
Incidentally, another aspect of the specification we may want to support is lastEventId
- SSE messages can have an id:
attached which allows clients to resume where they left off when the connection drops. Not sure how to model this as yet.
For discriminant in union, the best an emitter can do is to check all the required properties at every level to speculate whether this object belongs to one of the union variants. However, we cannot prevent the service backend from returning extra properties, including those required properties that belong to other union variants, even if they were defined as discriminable by required properties.
It occurs to me whether it matters to which union variant those properties belong. As long as those properties share the same name and type, the deserialization process will remain the same. Furthermore, since these union variants are part of the union, any instance where the union is used can be replaced by any one of the union variants.
There may be cases where we have to narrow down the type? not sure how the typespec would look like in that cases ?
what I am worried about is properties have the same name and type but differ in format, one example would be a union of bytes | utcDateTime. As suggested here https://github.com/microsoft/typespec/issues/2345
Also, FYI, other languages emitters currently don't have plans to support unions, they will simply generate protocol methods for it, and will have to add manual customization if they decide to support it in DPG. Their premise is that they will not take any action to verify if an object satisfies the types of other union variants as long as they find one match.
I've updated the proposal with two changes:
Using the former, we can describe the semantics of OpenAI's endpoint as follows:
@post op createCompletionStream(): SSEStream<Event>;
@sseEventTypes
union Event {
@sseContentType("text/plain")
@sseTerminalEvent
"[done]";
Completion;
}
Notes:
sse
prefix.TBH I like the terseness of anonymous and named union to express the optionality of event type. But there is another voice for named union, does the union name take too much role?
It seems that different places have different interpretation. For example the name could be an discriminator for discriminated union, sometimes the name could be just an information in regular union. Here in SSE the named union is more like record, they are key-value pairs, key would be event type, value would be event data type. So I may ask questions like:
union
?Also I was wondering maybe we could have a decorator sseEventTypeName
to express the event type not union name. So we could support one event data model with customized event type.
@post op createCompletionStream(): SSEStream<Completion>;
@sseEventTypeName("completion")
model Completion {
text: string;
}
Also we could support discriminated union with different event type names.
@post op createCompletionStream(): SSEStream<Pet>;
@sseEventTypes
@discriminator("kind")
union Pet {
@sseEventTypeName("catEvent")
cat: Cat;
@sseEventTypeName("dogEvent")
dog: Dog;
}
Lastly I think the SSE's breaking change evolution story would be quite interesting. Taking the above case as an example
Completion
Completion
, would this be a breaking?Foo
, would this be a breaking?Bar
, would this be a breaking?hey guys, chiming in support for this. Is there a recommended workaround in the meantime?
Our HTTP library should add support for defining SSE endpoints.
SSE is a fairly simple protocol where a service responds with a content type of
text/event-stream
and events are encoded in plaintext. Events are separated by two line breaks. Event data is provided by lines prefixed withdata:
. Events can optionally have an event type, in which case thedata:
lines for the event are preceded by anevent:
line.Clients can consume SSE Events in web browsers using the
EventSource
API, but naturally we want to generate clients that make this more ergonomic.Proposal
The gist of this proposal is to enable specs like the following:
Or, using event types:
SSEStream<T, TEventEncoding = "application/json">
Defined as the following:
This type enables emitters that don't understand SSE to still consume a vaguely appropriate shape including the proper content type and string body. If you understand SSE, the following decorators are used to determine the API shape.
SSE event payloads are just strings, but in practice they are often JSON encoded. This template allows customizing how an emitter should encode/decode the SSE event payloads. "application/json" is probably the most common so it is default, however "text/plain" would also be another common option.
@sseStreamOf(T: unknown)
(internal-ish)Indicates the payload of SSE events. If
T
has the@sseEventTypes
decorator, it defines the event types and payloads. Otherwise, event types are not used and events all share the same provided shape.@sseEventContentType(T: valueof string)
Customizes the content type of the event payload, enabling emitters to encode/decode different serializations.
May be applied to union variants when the union has the
@sseEventTypes
decorator. In such cases, it overrides the value set on the response payload (i.e. the value passed to theSSEStream
template).@sseEventTypes
Applied to a union, this decorator instructs the emitter to treat the union as defining the event types in the API.
When the a union variant is named, the name is the event type and the value is the corresponding payload type. When the union variant is anonymous, no event type is sent. This means that multiple anonymous union variants all describe possible events that may or may not be discriminated, and emitters will likely need to do reflection on the event value is to see what logical event they're dealing with.
@sseEventBody
Applied to a model property, this decorator defines what the logical body of the event is. This allows emitters limited ability to unpack event envelopes, passing only the logical event into user code. We may want to only support this at the top level of models for now.
Example usage: