API surface and REST-based approach

tidoust commented 7 years ago

The TAG makes the following comment that touches on (lengthy) discussions the TV Control WG had on whether the API could better be formulated as HTTP request/response exchanges:

[[ We're wondering if new APIs in this style might be phrased in a way that has a smaller API surface area, e.g., HTTP request/response or Service Worker Foreign Fetch.

May be possible to model many of the APIs here as locally-provided HTTP services; e.g. this: https://w3c.github.io/tvapi/spec/#tvsource-interface

Might instead be modeled as a set of requests and responses via fetch():

fetch("https://example.com/services/v1/channels")
   .then((resp) => {
      resp.json().then((channels) => {
         console.log(channels);
      });
   });

Open issues with a design like this involve a standard location for such an RPC interface and a format for the returned data. This seems like a good thing for a TV control interface spec to define, perhaps using a system like JSON Schema

It's also unclear how the event-sending side of this might work, but we're confident that can be overcome in the Foreign Fetch model, perhaps by allowing FF responses to contain MessagePort objects. ]] https://github.com/w3ctag/spec-reviews/issues/111#issuecomment-257746605

tidoust commented 7 years ago

I'm not entirely clear how MediaStream get represented in that model, probably as a URI. I note early discussions in the TV Control CG on "URIs vs. MediaStream" at: https://lists.w3.org/Archives/Public/public-tvapi/2014Sep/0001.html

tidoust commented 7 years ago

Some thoughts below to capture my understanding of the TAG's comment.

I believe there are 3 main models that may be used to design the TV Control API:

A hardware centric model. This more or less matches the current spec, and we're now moving away from it. The interfaces get introduced in a way that simplifies the mapping onto the underlying hardware. This simplifies implementation in theory. However, this approach is not flexible in practice. It cannot easily integrate more advanced use cases (e.g. online sources not directly linked to a tuner, constraints of more advanced decoding circuits, tuners that can return more than one streams at once)
An internal source-centric model. This is more or less what we're heading to in current group discussions. In this model, the TV/Radio box is seen as an internal device, exposed through an API, that can produce one or more media streams. Through parameters, the application can constrain the stream, e.g. to switch to another channel, enable/disable particular tracks. This model is close to "getUserMedia" (which deals with cameras and microphones), although the parameters to constrain the tracks are of different nature in the case of the TV Control API.
An external source-centric model. This is my understanding of what the TAG proposes here. In this model, the TV/Radio box is seen as an external endpoint that can provide one or more media streams. The application can send requests to this external endpoint to apply additional constrating on a media stream using HTTP requests (WebSockets could work too) with specific command messages. From a media stream perspective, the external endpoint could be represented as a sort of RTCPeerConnection and the application could perhaps be given an RTCRtpReceiver instance per media stream track. Or perhaps the media stream could be represented as a dereferenceable HTTP URI in that model.

I would say that the main advantage of the internal source-centric model is that the API exposed to the developer can remain straightforward: the developer will directly interact with the objects exposed by the API. One drawback is that it requires buy-in from browser vendors to implement the different facets of the API.

One advantage of the external source-centric model is that the implementation of the TV/Radio box remains mostly external to the browser runtime. In other words, the conformance class of the specification would be the TV/Radio box in this model, not the user agent. One drawback is that things are a tad more complex for developers who now need to prepare and wrap commands to the external endpoint in HTTP/WebSockets calls. This model also triggers a number of open questions, including how to represent media streams (whether it is a good idea to pull the RTC stack for instance), how to push events from the TV/Radio box onto the client, at which location to expose the endpoint, how to address synchronization/latency issues.

A couple of possible questions for the TAG:

Does that more or less capture the proposed idea?
Would you be making the same comment to "getUserMedia" if it started over again today?

tidoust commented 7 years ago

One more aspect that I overlooked in my previous comment.

A MediaStream does not directly give access to flows of bytes. It provides a handle to associate a source with sinks (e.g.

For instance, in TV/radio sets, I suppose that the decoding pipeline directly connects the tuner to the graphics card for rendering. And I suppose it is a feature.

I'm not sure how to achieve this in the external source-centric model. If the TV box is to be an external endpoint from the perspective of the user agent, then the user agent needs to fetch the bytes somehow before it renders them.

If we want user agents not to fetch the bytes, then we need some way for them to connect directly to the internals of the supposedly external TV box... I believe that defeats the main purpose of the external model. Essentially, we're back to the internal model. Or is there a simple way to achieve this while preserving a clean separation between the TV box and the user agent?

chrisn commented 7 years ago

Note that the external source-centric model opens up new use cases such as viewing of the media on devices connected to the TV receiver over a home network. This is really interesting but not in scope of the current WG charter.

travisleithead commented 7 years ago

This sounds like great progress, and that you are open to making some of these fairly big changes, thank you!

We admit that there is a lot of moving parts for which we are less-familiar, and so coming to a shared understanding of the mental model can be a bit challenging. Having said that, it sounds like the direction you have decided to go (the internal source-centric model) sounds great (better than approach 1 at any rate. We also wonder if there is room for a hybrid approach (between the 2nd and 3rd approaches) whereby the media might be brought over via MediaStream (case 2), but some of the other configuration done using case 3...

At this point, we will likely want to wait and see how this develops, as it seems as if the spec hasn't taken some of the changes you're currently thinking about.

w3c / tvcontrol-api

API surface and REST-based approach #24