An Alternative (WebSockets) Design

As discussed after the meeting last week (7 July 2022), I would like to propose an alternate design for Solid Notifications.

Before I even begin, a few disclaimers are in order:

I am coming at this solely from a client's perspective.
Apologies, in advance, for getting any nomenclature wrong (the language is in flux anyway).
What little experience I have is with WebSockets and that is the perspective I take.

Background

The current Solid Notification Protocol requires a number of discovery and initialization steps before we can actually start receiving any notifications:

Obtain the link header pointing to the Storage Metadata Resource from a Solid resource.
Discover Notifications Channel and Capabilities for the Subscription/Channel Type from the Storage Metadata Resource.
Place a request for a Notifications (say, WebSockets) Endpoint with desired capabilities/features (and resource? resources?).
Initiate a Notifications (again WebSockets) connection on the received endpoint. (Are these Endpoints reusable? Can they be cached?)

These are clearly a lot of round-trips before we can actually start receiving notifications.

My other concern is with Internet outages (especially micro-outages), which are extremely common where I live (even on broadband, let alone mobile). Going through even an extra step to recover a disrupted WebSocket connection is going to lead to a poor user experience. See @acoburn comment below.

Further, it is not specified yet, if we shall have one channel per resource or one channel per client. Even in the latter case (one channel per client), if the client wants to add or remove resources and/or change notification features like extend subscription duration, it must re-negotiate a new endpoint first using HTTP. While this might be necessary for uni-directional protocols, it makes little sense to carry this over to a bi-directional protocol like WebSockets.

Proposal

Provide a generic endpoint for the most common way to connect, which I believe would be WebSockets, as a header on every resource (like we do now).
A client opens a WebSocket connection using the generic endpoint.
As soon as the connection is established, the client sends authentication data, features and the list of resources they wish to subscribe to, all in a single message. (Think of it like a GET to multiple resources on a Solid pod and with an endless response) a. The server may close the connection due to a failure to authenticate. b. The server may re-negotiate features. c. The server starts sending notifications (perhaps the first message contains an 'accept').

Changes of features, addition or removal of resources can all be done in band through a client message, without the need to initiate new connections.

Even if we choose to go with one channel per resource, the operation is no more complex than a new connection & GET like request.

Benefits

Faster discoverability + caching of endpoint.
Fewer round trips.
The WebSocket endpoint is not tied to connection properties and/or resources. Thus, we can renegotiate features, add or remove resources without stepping out of band.
Different properties can be specified for different resources within the same connection and perhaps even the same message (such as on start-up).
One connection per server/pod only; client has fewer connections to manage/recover in case of outage.

@uvdsl already has a symmetry objection to this proposal. While I have hinted at it already, I shall defer to him for a full explanation.

I also invite other Subscription Type authors to consider how their proposals may be affected.

My other concern is with Internet outages (especially micro-outages), which are extremely common where I live (even on broadband, let alone mobile). Going through even an extra step to recover a disrupted WebSocket connection is going to lead to a poor user experience.

AFAIK once the client gets the source of notifications (in this case WebSocket) the client can re-establish the connection without re-subscribing. We might consider in a separate issue if notification sources can expire and the subscription response should include that information.

AFAIK once the client gets the source of notifications (in this case WebSocket) the client can re-establish the connection without re-subscribing. We might consider in a separate issue if notification sources can expire and the subscription response should include that information.

I have not seen this specified in the standard, and always assumed that secured endpoints are for one time/limited use. There has to be some expiry, one cannot expect a server to remember endpoints it has doled out for eternity. I agree, this needs a separate issue....

Discovery resources are cacheable. That is the nature of resources retrieved via GET

Thank you for following up on the discussion. I would like to offer my perspective:

These are clearly a lot of round-trips before we can actually start receiving notifications.

Well, those are 3 (cacheable) requests prior to establishing a notification (which is the one you always will have).

Further, it is not specified yet, if we shall have one channel per resource or one channel per client.

I believe this is a subscription-type-specific question. In the case of Web Push, for example, this is already decided by the used standards and does not pose a question. (It is one channel per client).

Even in the latter case (one channel per client), if the client wants to add or remove resources and/or change notification features like extend subscription duration, it must re-negotiate a new endpoint first using HTTP.

I believe this is websocket related? At least I am not aware that "re-negotiating a new endpoint" is a thing, at least in "my subscription type world". I would expect a client to simply open a new one with the already discovered enpoint (and optionally to end the old subscription). That's one (or two) requests.

b. The server may re-negotiate features.

When features supported by the server are discovered beforehand, there is no need for feature negotiation.

c. The server starts sending notifications (perhaps the first message contains an 'accept').

This is specific to the websocket subscription type.

One connection per server/pod only; client has fewer connections to manage/recover in case of outage

Again, I believe this question is websocket specific?

@uvdsl already has a symmetry objection to this proposal. While I have hinted at it already, I shall defer to him for a full explanation.

I would prefer a standard flow that is universally used across subscription types to discover and establish a subscription. I would object the idea of "each subscription type defines discovery and establishing individually because perf".

Things that make me ponder

For me, the main point of discussion here is how discovery and establishing a subscription works. Currently 3 requests, right?

While I can (maybe) imagine that in specific cases it is more performant to use websockets for discovery and establishing a subscription, I currently do not think that it is a good approach (based on my current understanding of Websockets and when to use them).

My understanding of websockets is that they shine when communication is actually bi-directional, i.e., when the server decides to send something without the client asking beforehand. That is, use websockets when there is no request-response pattern. That's why I do think websockets are great for notifications.
Also, opening a websocket connection for only 3 requests (and responses) and then closing it (as saving resources may be important, and one does not want to use websocket notifications maybe?) seems strange. Keeping the websocket connection open seems even stranger if you do not even use it.
I have no idea how much the performance difference of literally 3 HTTP requests vs. websocket communication is,
also considering the additional complexity in the implementation for websockets communication
Moreover, since we have a request-response pattern here, I would like to point out that HTTP offers standardized method semantics. Upon receiving a HTTP request, the server knows what GET/POST ect means, what type the payload is and so on. Doing "negotiation" in the websocket connection gets rid of all that nice stuff. It instead has application-specific semantics that you will need to define, and even then I see a deep rabbit hole.

Maybe someone with more experience with websockets has a different perspective on this and I appreciate their wisdom.

Just some minor follow-ups on @uvdsl 's detailed reply:

Further, it is not specified yet, if we shall have one channel per resource or one channel per client.

I believe this is a subscription-type-specific question.

It would be very confusing to the consumer to have different sub-protocols with completely different approaches to notifications within the same protocol. I would argue that in cases of such significant divergences, they should really be treated as different protocols under the same branding.

I am not aware that "re-negotiating a new endpoint" is a thing

I think we mean the same thing here. One would have to request the discovery resource for a new endpoint and then open a new connection using that (and optionally close the old connection), which is just more work for the client (than working on a single connection). That's all I mean't by renegotiating (cut me some slack since I am not from a CS background).

I would prefer a standard flow that is universally used across subscription types

Ideally, I am with you on this one! But if there is a disproportionate use of one method (and I expect there to be), optimizing for that one method might become more important than symmetry. Perf is important to user experience.

websockets is that they shine when communication is actually bi-directional

I see the real benefit of WebSockets, not with the current protocol, which is read-centric, but when it shall inevitably be extended to read-write (which one will just have to, if the potential of Solid is to be realized). I mean clients notifying servers, instead of using something like PUT, which is much more useful when small changes are being made quickly.

This is a personal view, one that might even be unpopular, but the focus with Solid should be on the long game, not how we can improve the web now, but what the web should actually be in the coming decades.

also considering the additional complexity in the implementation for websockets communication

Another unpopular and biased view: The aim should always be to minimize client complexity, even if it makes life harder for server implementers (the only exception being a cost prohibitive encumbrance). Ultimately, it is the client experience that drives adoption.

HTTP offers standardized method semantics.

Can we not just use the same semantics for notification (or even build upon them)!

@acoburn gave me a lengthy explanation for the design choices, in particular, for why authentication is better handled out of band using the HTTP infrastructure. I am still fuzzy on many issues surrounding the protocol, especially around the access and management of multiple resources simultaneously (which is the bread and butter concern of a client). But at least I understand the motivation behind the design choices.

My only request is that these reasons must be documented alongside the protocol spec (rather than spread across issues and trapped in implementers' minds), so that people who contribute to and use the protocol have an understanding of why these design choices have been made. Once that is done, I shall be happy to close the issue.

solid / notifications