Revamping the Content Filter architecture

mickael-menu-mantano commented 5 years ago

The current Content Filter architecture (at least Swift and Kotlin) has some issues:

Instead of having well-defined and composable content filters, there's a big class for each publication format which leads to some code repetition and lack of flexibility and clarity. Those classes sometimes do the work directly in it, or delegate some of the filtering to other private classes which are actually in essence content filters (eg. DRMDecoder, FontDecoder)
Unlike with R1, it is entirely private to r2–streamer, so the other modules or the host app can't add additional filters.
- This is particularly problematic because a lot of code related to the navigator is directly put in the streamer (JS and CSS to be injected). This breaks encapsulation and leads to changes in the navigator that are tightly dependent of changes in the streamer (I noticed that a lot working on r2-swift).

I don't think the Content Filter was really specified for R2, but we can take some leads from R1 which has a pretty good working CF architecture. We need (defined in r2-shared instead of r2-streamer):

A ContentFilterChain interface that processes the content through all the registered CF and expose a registerContentFilter() API.
- The Fetcher instance would expose a contentFilterChain property that can be provided to the Navigator to initialize the content filters required to display a particular format.
A ContentFilter interface
- The R1 interface has a priority property that determines the order of processing compared to the other CF. This is interesting because a host app can add a custom CF to be fired at the right time.
Default implementation that are registered by default (eg. DRMContentFilter, FontDecodingContentFilter)

If this is of interest to the team, I can write a spec Markdown doc and implement it in Swift. I expect this to be a small refactoring (~1 day) but it could improve tremendously the streamer and the extensibility of Readium 2.

mickael-menu-mantano commented 5 years ago

A related discussion, if we move the JS/CSS files to the navigator, is that we need a way to serve those files by a server (this is currently directly done in the streamer directly).

A possible solution would be to add a shared interface (eg. ResourcesServer) exposing an API to serve a local directory at a specific URL. The r2-streamer.PublicationServer would implement such an interface and the host app can then initialize the Navigator with it. This is already what I implemented in a PR for Swift, to support WebPub (which are by nature not filtered through the CF chain). This interface could be useful for the host app too, if they need to serve some files for their own extensions.

Another solution could be to add a second private local server in the Navigator dedicated to serving ReadiumCSS. This is simpler on the host app side and would work even if the app doesn't use the PublicationServer at all (eg. only reading webpubs). But it consumes (arguably few) more resources. This solution could be implemented as a fallback mechanism if no ResourcesServer instance is provided to the Navigator.

danielweck commented 5 years ago

Regarding ReadiumCSS assets: in the R2 "JS" implementation (i.e. TypeScript, in the Node / Electron integration context), the "streamer" component of the architecture owns the HTTP server instance and exposes an API function that allows consumer applications to register static HTTP hosting (filesystem path + URL route). This is an extension mechanism internally facilitated by Express and some middleware techniques, but this is otherwise a pretty straightforward solution. The exact same API function could ; for example ; be used to serve MathJax resources, if we decided to inject this MathML rendering lib into EPUB HTML content documents (like we do in R1).

danielweck commented 5 years ago

Regarding DRM / LCP / font de-obfuscation: in the R2 JS implementation, there is undesirable tight-coupling too, notably in the "streamer" component (even though the actual functionality is implemented where it belongs, i.e. r2-lcp-js). The architectural de-coupling would require non-trivial refactoring and the introduction of abstraction layers (i.e. adapters, service factories, etc.), for example by adopting R1's notion of Content Modules and Filters. But due to development priorities and the sole focus on LCP, the current implementation has been relying on a more basic concept of "transformers" which is ; as you know ; an incomplete design approach because it does not encompass the initialization / parameterization phase of a typical DRM scheme, only the decryption part (for which we have to setup the crypto context in a separate, not well-defined module). I am not sure how high a priority it is to implement DRM extensibility the R1 JS implementation, but I suspect quite low at the moment.

danielweck commented 5 years ago

PS: I forgot to mention that the chaining of "transformers" in the R2 JS implementation is of course implemented, in order to feed encrypted resource streams (e.g. HTML files) through the decryption filter first, then into any relevant post-processing unit (e.g. injection of ReadiumCSS, user settings, etc. into HTML documents). However, the current design in R2 JS is rudimentary, for example there is no generalized concept of "transformer priority" (just an ordered list of executable content filters). Also, crucially there is no distinction between media assets that can be decrypted in full (i.e. most publication resources) vs. audio/video (etc.) assets which require "streaming" (i.e. partial buffer decryption depending on HTTP byte range requests). This was important in R1 in order to ensure that post-processing content filters can have access to fully-loaded resources (for example when intercepting and modifying HTML markup as it passes through the chain of transformers).

mickael-menu-mantano commented 5 years ago

Thank you for your feedback on the JS implementation Daniel.

the "streamer" component of the architecture owns the HTTP server instance and exposes an API function that allows consumer applications to register static HTTP hosting (filesystem path + URL route)

This sounds exactly like the ResourcesServer.serve(URL, at path: String) interface I added, that's great to see that we are heading to the same solution. Would you mind linking to this particular API to see if we can get closer with the API names?

an incomplete design approach because it does not encompass the initialization / parameterization phase of a typical DRM scheme, only the decryption part (for which we have to setup the crypto context in a separate, not well-defined module)

Yes it's similar on Swift. I think the DRMDecoder, while a nice attempt at making a generic DRM content filter, makes things a bit more complicated on the host app parsing side. The implementer has to handle a complicated workflow by filling a half-initialized DRM model object in a two-step parsing. The Content Module approach could make things run a bit more seamlessly. I don't think it's a pressing issue though since LCP is working well now, and this approach could at least work with Adobe too.

Also, crucially there is no distinction between media assets that can be decrypted in full (i.e. most publication resources) vs. audio/video (etc.) assets which require "streaming" (i.e. partial buffer decryption depending on HTTP byte range requests).

That's a good point, R1 used to have an OperatingMode for CFs:

/**
  enum class OperatingMode
  This enum class defines which way a given ContentFilter operates, in regards
  of how it makes use of the bytes that it is given as input. More details are
  given on each enum member.
*/
enum class OperatingMode
{
    Standard,  /** < This ContentFilter does not require the full range of bytes, nor it does work on specific byte ranges. */
    RequiresCompleteData,  /** < This ContentFilter requires the full range of bytes of a given resource to operate. */
    SupportsByteRanges  /** < This ContentFilter can operate on specific byte ranges. */
};

And the LcpContentFilter had it set toSupportsByteRanges.

But on R2 it looks like a resource is fully read and decrypted, regardless of whether it's a byte range request or not.

I don't think we need an OperatingMode for R2 (Swift), because we are always manipulating streams (which can be lazy-read or just a container to the full data post-processing). So a given CF might return a new stream subclass manipulating the data while it's being read. However, there might be an issue with the DRMDecoder implementation ignoring the byte range requests. I will check with the CC shared culture EPUB.

mickael-menu commented 4 years ago

A proposal to solve this issue: https://github.com/readium/architecture/pull/132

ContentFilter was renamed in Resource.Transformer.

Transformer priorities are not needed anymore because we have different levels where a transformation is done.

fetchers

readium / architecture

Revamping the Content Filter architecture #103