readium / architecture

📚 Documents the architecture of the Readium projects
https://readium.org/architecture/
BSD 3-Clause "New" or "Revised" License
171 stars 33 forks source link

Media overlays in the streamer #110

Open qnga opened 4 years ago

qnga commented 4 years ago

How media overlays should be implemented in the streamer? The question has first been raised here. There media overlays are introduced as a service provided over HTTP. It's not clear for me whether they should be able to be accessed both in-memory and over HTTP depending on the format. In any case, the fetcher seems to me to be the right place to convert SMIL media overlays to in-memory Synchronized Narration.

HadrienGardeur commented 4 years ago

It mostly depends on the plaform.

On the Web, "media overlays" would only be served using HTTPS:

On mobile, apps that use strictly native code would only rely on HTTP to fetch the HTML and audio resources, everything else would be handled in-memory. But apps that rely on a navigator built entirely in JS may behave exactly like Web Apps and fetch the manifest and Synchronized Narration document using HTTP as well.

qnga commented 4 years ago

On mobile, apps that use strictly native code would only rely on HTTP to fetch the HTML and audio resources, everything else would be handled in-memory.

Everything else doesn't include images and CSS, does it? Why this choice? I understand the idea of passing the manifest in memory, but I didn't think that concerned resources too. It seems to me to be conceptually twisted as it relies on two separate mechanisms for resource handling, and moreover prevents remote resource access for some resource types.

Now, assuming this choice in the Kotlin app, SMIL files may be parsed and their content be added as a kotlin property into the link objects of overlaid Html files (and not into dedicated links). Is that your vision?

I though of still another way: converting SMIL into sync narr format and inject them right into concerned Html files in the fetcher. This is particularly interesting if you plan to use audio element to play sound in synchronized narration. A Js lib might be used to play sound and highlight active text, though controls could probably still be native. I really don't known how you are planning to implement all of this.

HadrienGardeur commented 4 years ago

Everything else doesn't include images and CSS, does it? Why this choice?

They're served over HTTPS as well. In my explanation, I only mentioned the resources that are fetched directly, not the ones that are fetched indirectly as HTML gets rendered (images, CSS, JS and fonts).

Now, assuming this choice in the Kotlin app, SMIL files may be parsed and their content be added as a kotlin property into the link objects of overlaid Html files (and not into dedicated links). Is that your vision?

I though of still another way: converting SMIL into sync narr format and inject them right into concerned Html files in the fetcher. This is particularly interesting if you plan to use audio element to play sound in synchronized narration. A Js lib might be used to play sound and highlight active text, though controls could probably still be native. I really don't known how you are planning to implement all of this.

We'll have to do both and the in-memory model will be based on Synchronized Narration anyway.

qnga commented 4 years ago

So, as far as I understand, in mobile apps:

Questions:

qnga commented 4 years ago

I think I was again mistaken. NOT

  • when the navigator or the testapp (I don't know) encounters a link with type "application/vnd.syncnarr+json", it doesn't fetch the resource over HTTP, but directly use Publication.linkWithHref method to retrieve the link containing the synchronization data.

BUT

Questions remain to be answered.

qnga commented 4 years ago

My last interpretation has the drawback of getting the manifest dirty with Synchronization Narration data right inside. I suggest acting as if Media Overlays would be served over HTTP in the decidated resource with SyncNarr being the native format, more like my first interpreration.

Concretely: