openid / sharedsignals

OpenID Shared Signals Working Group Repository
45 stars 11 forks source link

Adding a Stream ID #4

Closed FragLegs closed 1 year ago

FragLegs commented 2 years ago

In previous discussion, it was decided that we needed a way for there to be potentially more than one stream between a transmitter and receiver. Some use cases for this were the needs of GDPR to keep data localized, having different polling frequencies for different types of events, etc. What follows is a high-level proposal for how we could add stream IDs without significantly changing the structure of the SSE framework. I would like to have some discussion here so that the high-level details can be settled before I create a pull request for the change.

Support for default and/or named streams

First of all, because stream IDs dictate something about how the data model works, transmitters should be able to decide whether they support streams with IDs or not. Currently, there is a single default stream per receiver. I propose adding a new optional property to the transmitter configuration response from /.well-known/sse-configuration:

stream_types: [
  "https://schemas.openid.net/secevent/stream_type/default",
  "https://schemas.openid.net/secevent/stream_type/named"
]

This allows the transmitter to support both default and named streams. Many, perhaps most, receivers will only care about having a single stream, so continuing to support a default stream is user friendly.

The default stream does not need to be created. It exists as soon as there is a transmitter/receiver relations if the transmitter supports default streams. Any named streams must be explicitly created.

Changes to the configuration_endpoint

Changes to the stream configuration object

The most significant changes to the spec are centered around the configuration_endpoint, in the GET, POST and DELETE verbs, as well as in a new PATCH verb (detailed below).

With the exception of DELETE, all of these verbs either take a Stream Configuration object as an argument or return a Stream Configuration object. In all cases, we would update the Stream Configuration object to include a new optional parameter, stream_id:

In some ways, you could think of a default stream as a stream whose stream_id is null.

Create a Stream - configuration_endpoint [POST]

Creates a new stream

Arguments

Results

Get Stream Configuration - configuration_endpoint [GET]

Gets the configuration of a stream.

Arguments

Results

Update a Stream - configuration_endpoint [PATCH]

Updates a stream's configuration.

Arguments

Results

Open question

How should we treat missing read/write values in the stream configuration object?

  1. PATCH semantics imply that missing values should be interpreted as "leave the value as it is".
  2. Current SSE framework interprets missing read/write values as "delete the value". But interprets missing read-only values as "leave the value as it is".
  3. Google best practice suggests including a field mask to indicate which fields are being updated. Do we want that?

Delete a Stream - configuration_endpoint [DELETE]

Deletes a named stream or resets a default stream to its default configuration. When a stream is deleted (or reset for default streams) all subjects that have been added to the stream, events that have been enqueued in the stream, or status that has been set on the stream should also be deleted.

Arguments

Results

New endpoint: list_streams_endpoint

A new endpoint will be included in the transmitter configuration response from /.well-known/sse-configuration:

list_streams_endpoint: "https://tr.example.com/sse/mgmt/list_streams"

List stream IDs - list_streams_endpoint [GET]

Returns a list of the stream IDs in use

Arguments

Results

Other changes

The following endpoints all get an optional stream_id argument, either in the body of the POST endpoints or the URL params of the GET endpoint. If the stream_id argument is missing, assume the default stream is being indicated.

All five methods get a new possible return value:

randiepathirage commented 2 years ago

This proposal is a good improvement and it covers most of the missing parts in the SSE 1.0 - draft 01. I wonder what happens if we send a GET request to list_streams_endpoint when there are no "named streams" created, will it still respond with 200 OK and empty list?

FragLegs commented 2 years ago

This proposal is a good improvement and it covers most of the missing parts in the SSE 1.0 - draft 01. I wonder what happens if we send a GET request to list_streams_endpoint when there are no "named streams" created, will it still respond with 200 OK and empty list?

That's a good question. If there are no named streams, it would be an empty list. Perhaps the endpoint should be list_named_streams_endpoint and it will throw a 501 if you don't have the "named streams" feature turned on. I think that might be a little less confusing. Something like this:

New endpoint: list_named_streams_endpoint

A new endpoint will be included in the transmitter configuration response from /.well-known/sse-configuration:

list_named_streams_endpoint: "https://tr.example.com/sse/mgmt/list_named_streams"

List named stream IDs - list_named_streams_endpoint [GET]

Returns a list of the stream IDs for any named streams in use

Arguments None

Results

FragLegs commented 2 years ago

After further discussion, we decided to lean less on the need for backwards compatibility. Instead of having default and named streams, we will only have named streams. The receiver MUST create the stream before using it. The proposal is rewritten below with those updates in mind.

Changes to the configuration_endpoint

Changes to the stream configuration object

The most significant changes to the spec are centered around the configuration_endpoint, in the GET, POST and DELETE verbs, as well as in the new PATCH and PUT verbs (detailed below).

With the exception of DELETE, all of these verbs either take a Stream Configuration object as an argument or return one or more Stream Configuration objects. In all cases, we would update the Stream Configuration object to include a new parameter, stream_id:

Create a Stream - configuration_endpoint [POST]

The transmitter creates a new stream with a unique stream ID

Arguments

Only the optional read/write elements of the stream configuration object are allowed:

Results

Get Stream Configuration - configuration_endpoint [GET]

Gets the configuration(s) of a stream or streams.

Arguments

Results

Update a Stream - configuration_endpoint [PUT]

Replaces a stream's configuration. Missing read/write values are deleted. Missing read-only values cause an error.

Arguments

Results

Update a Stream - configuration_endpoint [PATCH]

Partial update for a stream's configuration. Missing read/write values are ignored. Missing read-only values are ignored.

Arguments

Results

Delete a Stream - configuration_endpoint [DELETE]

Deletes a stream. When a stream is deleted all subjects that have been added to the stream, events that have been enqueued in the stream, or status that has been set on the stream should also be deleted.

Arguments

Results

Other changes

The following endpoints all get a required stream_id argument, either in the body of the POST endpoints or the URL params of the GET endpoint.

All five methods get a new possible return value:

tulshi commented 2 years ago

I'm a bit concerned about the "PUT" method of updating a stream:

  1. If a Transmitter doesn't support a certain read-only value (e.g. min_verification_interval), then is it an error for the Receiver to call the PUT method without specifying the value?
  2. If a Transmitter modifies their implementation and starts supporting a new read-only value (either one that is defined now, or may be defined later as the spec evolves), then does the Receiver implementation break because it is not providing the read-only value?
tulshi commented 2 years ago

It'll be good to clarify what happens when read-only values are specified in the PATCH (update) method. Right now the proposal states that the missing read-only values are ignored. The notes from the Jan 22 call do not make this clear, so we should probably discuss and close it in the next call. In those notes, we had talked about reviewing industry best practices. The Google aip.dev update description uses "update masks" to determine which values to update. My recommendation is that if the Receiver sends a read-only value in the request, it should be verified to match. If a read-only value is missing, it should be ignored. This is so that if the Receiver has a disconnect about the Transmitter's configuration, it should not be allowed to update that configuration.

FragLegs commented 2 years ago

If a Transmitter doesn't support a certain read-only value (e.g. min_verification_interval), then is it an error for the Receiver to call the PUT method without specifying the value?

This is a good point. For the PUT behavior, the expected use case is the Receiver calls GET, modifies the object that was returned, and then sends the modified object. I think we should make all of the read-only values REQUIRED in the data that is sent from the transmitter. If a transmitter doesn't want to support min_verification_interval they can set it to 0 seconds (which would be functionally the same as not having it).

If a Transmitter modifies their implementation and starts supporting a new read-only value (either one that is defined now, or may be defined later as the spec evolves), then does the Receiver implementation break because it is not providing the read-only value?

You would get an error in that case. The PUT behavior really expects that the Receiver gets the configuration from GET or POST first. So, if you try to PUT and get a 400 error, you may want to call GET and try again with the updated object.

The reason we need both PUT and PATCH is so that we can delete read-write values. Since PATCH is supposed to ignore missing values, there is no way to tell it to delete something. PUT gives us that ability.

My recommendation is that if the Receiver sends a read-only value in the request, it should be verified to match. If a read-only value is missing, it should be ignored. This is so that if the Receiver has a disconnect about the Transmitter's configuration, it should not be allowed to update that configuration.

I'm pretty sure that is what we had discussed on the call. And it is what the PATCH behavior does in this implementation (notice the 400 error that gets thrown if a read-only value is incorrect).

FragLegs commented 2 years ago

Also, what would y'all think about putting aud in the POST arguments? Let the Receiver specify at the time of Stream creation what the aud value should be.

Right now, two pieces of info need to be shared out of band: the bearer token and the aud value. It would be really nice to make it so that only the bearer token needed to be shared outside this framework.

ghost commented 1 year ago

Just catching up on this...

Overall this looks like a good addition to enable multiple streams to be included within the spec. Up till now, handling multiple streams between two entities required an implementation-specific mechanism.

A few more detailed comments:

adeinega commented 1 year ago

@frankt-vmware, I do wish to see more REST-like APIs, one thing that I would suggest using

POST /sse/stream/1234/subject

instead of

POST /sse/stream/1234/subject:add

The suffix "add" is unnecessary here, and if the specification still wants to use it, it should be using something like /sse/stream/1234/subject%3Aadd.

ghost commented 1 year ago

@adeinega

POST /sse/stream/1234/subject

Even better.

FragLegs commented 1 year ago

I like the idea of adding it to the path, except that we have to figure out how to encode that information in the /.well-known response. Right now, /.well-known returns something like this:

   {
     "issuer":
       "https://tr.example.com",
     "jwks_uri":
       "https://tr.example.com/jwks.json",
     "delivery_methods_supported": [
       "https://schemas.openid.net/secevent/risc/delivery-method/push",
       "https://schemas.openid.net/secevent/risc/delivery-method/poll"],
     "configuration_endpoint":
       "https://tr.example.com/sse/mgmt/stream",
     "status_endpoint":
       "https://tr.example.com/sse/mgmt/status",
     "add_subject_endpoint":
       "https://tr.example.com/sse/mgmt/subject:add",
     "remove_subject_endpoint":
       "https://tr.example.com/sse/mgmt/subject:remove",
     "verification_endpoint":
       "https://tr.example.com/sse/mgmt/verification",
     "critical_subject_members": [ "tenant", "user" ]
   }

where those specific endpoints are whatever the transmitter wants them to be. We would need to a) become prescriptive about what the endpoints must be, and b) figure out how to specific "stream id goes here" in the URL.

I'm not sure if those changes are non-starters for this discussion.

adeinega commented 1 year ago

@FragLegs, add_subject_endpoint and remove_subject_endpoint aren't really REST-like endpoints, they both can be replaced by one, say subject_endpoint. A receiver/client would use either

POST /sse/stream/1234/subject

or

DELETE /sse/stream/1234/subject

depending on what it wants to achieve.

FragLegs commented 1 year ago

Agreed. That doesn't fix the problem though.

FragLegs commented 1 year ago

@adeinega I added an issue addressing your point about not needing distinct add/remove endpoints in #39

ghost commented 1 year ago

@FragLegs Who owns the format of the contents of /.well-known? Is this something invented in the SSE/F spec, or have we imported it from elsewhere?

RFC8615 defines /.well-known/ (note the trailing /) as a container for well known URIs, but does not talk about /.well-known itself.

If we are free to make up the content, then we could follow typical REST API specs and use the :var notation for variables.

Giving:

   {
...
     "configuration_endpoint":
       "https://tr.example.com/sse/mgmt/stream/:stream_id",
     "status_endpoint":
       "https://tr.example.com/sse/mgmt/stream/:stream_id/status",
     "subject_endpoint":
       "https://tr.example.com/sse/mgmt/stream/:stream_id/subject",
     "verification_endpoint": [
       "https://tr.example.com/sse/mgmt/verification",
       "https://tr.example.com/sse/mgmt/stream/:stream_id/verification"
     ],
...
   }

Here the verification endpoint has two modes /verification to send a message to all streams for this ER, and /stream/:stream_id/verification to send for just a specific stream.

The caller would have to know that subject requires POST and DELETE.

Thoughts?

adeinega commented 1 year ago

@frankt-vmware, I believe /.well-known/sse-configuration and its content is inspired by https://openid.net/specs/openid-connect-discovery-1_0.html and https://www.rfc-editor.org/rfc/rfc8414.

independentid commented 1 year ago

Responding with 409 on create for a server that supports only one stream is incorrect. 409 is meant for situations like two clients submitting a put causing one to be out of date. 409 is inappropriate because it invites the client to try again.

403 is more appropriate because it is a policy decision (even though it may be technical) to support only one stream. 403 tells the client not to retry which is appropriate here.