Suggestion: ephemeral application transports

I've been thinking about the interaction between the network and application protocol and the interaction between OSP agents and origins.

Some properties that I deem valuable:

It is highly likely that agents will want to persist authentication state over longer periods of time to avoid frustrating the user with constant reauthentication. The specification currently hints at this in Persistent State, but it doesn't go into detail of how to do this.
For the same reason, it seems favorable to have authentication happen at the user agent level (and a display device on the other hand). This ensures that the user doesn't have to reauthenticate for every origin that wants to cast.
Since splitting off the network protocol; it should (preferably) be usable independently. Including, providing discovery and authentication services for other protocols than the Open Screen Application Protocol.

To make the network protocol usable for a broad range of protocols, we need a good abstraction. One option would be to mux application transports over the network protocol. However, this comes with a potential performance impact of encapsulation, and encapsulation of different protocols also has a higher cost of implementation. Another options is to facilitate opening new TLS connections to transport application protocols. This doesn't come with the performance and implementation costs mentioned above. As an added advantage, it is already a popular abstraction used by many protocols. To open a TLS connection on the local area network you generally need: the target address (dns / ip and port) and target TLS certificate (Since we can't rely on Public Key Infrastructure). I'll refer to this as "connection materials" going forward. Note: for now I've only been talking about the protocol level, I'm not talking about browser APIs yet.

Since we're building a protocol to be usable in the context of a browser, there's an additional need to support a broader range of protocols and that's how to construct them with a browser API. However, this requires making sure we don't expose persistant identification to the JS Realm to avoid fingerprinting. One way to do this would be to create new constructors for all transports that we want to construct. However, this again adds implementation cost per protocol. Another option would be to expose the connection materials to the JS Realm. This would have far broader applicability; but it can only be done if the connection materials only contain ephemeral information.

So, this is where the above suggestion comes in. We extend the network protocol with the ability to allocate new ephemeral (remote) connection materials. The connection materials can be exposed to the JS Realm to allow any TLS-based transport to be constructed. The overall flow would be:

A device (TV) has an advertising agent.
The user agent has listening agent.
The user agent finds the device and performs authentication.
An origin requests to cast.
the user agent prompts for permission and let's the user select the appropriate advertising agent (device/TV).
The user agent sends a new network protocol message to the device to request a new connection materials. This message should probably contain some indication of the purpose of the connection being allocated. This is to be defined further.
The device allocates a new random mdns address (E.g. UUID.local) and a new ephemeral certificate for this endpoint. The device sends a response to the user agent with the newly created connection materials.
The user agent passes the newly created connection materials to the origin.
The origin JS code constructs the transport it needs.

Opening a new connection is done by repeating the above steps from step 4.

This approach gives us the following properties:

The connection materials are ephemeral and are only used for the scope of a single connection.
The connection materials are never be passed to multiple origins.
The connection materials are never passed to the same origin twice, giving the origin no persistant (network) information that could be used for tracking.
It allows constructing any TLS-based protocol.
The connection formation (address & TLS certificate). Can be passed to any JS constructor that represents a TLS-based transport. Including WebRTC or WebTransport (if a P2P version is revived).
Additional: If this is deemed valuable (E.g. for speed of adoption); the Open Screen Application Protocol could be implemented on top of a generic JS Realm transport. E.g.: OSAP over P2P WebTransport.
This fully separates the Origin-level connection materials from Agent-level ones. For the latter we should define how they can be persisted. To avoid re-authentication across browser sessions.

Open points I can think of:

There is some overhead in creating and keeping track of all these ephemeral connection materials. It doesn't seem like a big problem, but I'd like to hear others thoughts on this.
Observers on the local network could infer some information based on the number of allocated ephemeral mdns addresses. This may warrant a more detailed analysis.

I'd love to hear your thoughts @markafoltz, @wangw-1991, @pthatcherg, @anssiko.

w3c / openscreenprotocol

Suggestion: ephemeral application transports #351