readium / architecture

📚 Documents the architecture of the Readium projects
https://readium.org/architecture/
BSD 3-Clause "New" or "Revised" License
177 stars 33 forks source link

Locator Fragment(s), processing model #98

Closed danielweck closed 5 years ago

danielweck commented 5 years ago

I have expressed reservations (see https://github.com/readium/architecture/issues/95 ) about the suboptimal processing model proposed for the locations.fragments array of string values (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments ).

For example, the current proposal requires producer/consumer code to generate/parse ad-hoc fragment schemes for CSS Selectors and (partial)CFIs which lack precise definitions of escaping rules (see for example the (full)CFI border cases https://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-epubcfi-escaping ).

Consumer code must parse each string value in the locations.fragments array in order to extract meaningful information (i.e. to differentiate CFI from CSS Selectors, and even time Media Fragments, etc.). This introspection step seems unnecessary and costly, in the context of R2 implementations which ultimately do not rely on native browser engine support for the intended URI fragments. Even time Media Fragments will be deconstructed in order to extract timestamps and feed those values to underlying playback APIs.

Such "fragments" in their linearized / flattened form are certainly useful at the point at which the information gets encoded into actual URI fragments (which usually involves syntactic escaping that can be quite error-prone, see for example encodeURI vs. encodeURIComponent), for instance in src or href properties with JSON Schema type=string and format=uri-reference. However, I fail to see the advantages of prematurely serializing high-level constructs such as CSS Selectors and (partial)CFI into a format compatible with URI fragments, when consumers in the R2 architecture are likely to make use of them directly (e.g. CSS Selector document.querySelector() API).

Finally, in light of the DOMRange discussion (see https://github.com/readium/architecture/issues/95 ) which suggests a "first class citizen" JSON property in order to express the DOMRange construct in a structured manner (i.e. non-flattened/linearized for the purpose of URI encoding), I am wondering where the line in the sand lies between "locator/fragment" constructs that belong in the generalized model (i.e. array of generic strings that need to be parsed in order to extract meaning, or otherwise collated together into a URI fragment), versus other constructs that warrant a directly-identifiable / discoverable data type (e.g. locations.cssSelector="body div#id.class p#paragraph" + locations.partialCfi="/4/6/2[paragraph]" + locations.fragments=["#paragraph"] instead of locations.fragments=["#paragraph", "partialcfi(/4/6/2[paragraph])", "cssselector(body div#id.class p#paragraph)"]).

HadrienGardeur commented 5 years ago

If anything, this group has shown that there's a total lack of consistency in which fragment will be used by various apps.

If we had common agreement on this issue, I would be open to that argument, but I don't think that's the case at all.

I'll also quote a comment I've made in the other issue:

Overall, I'm not in favor of extending our current model with anything too specialized, I think the mix of position, progression, overallProgression and fragments is good enough:

  • it's as media-generic as we can be
  • it provides both context and the ability to express very specific locations
  • there's built-in extensibility through fragments (that can be defined per media type)

We had something more specialized elements before (with dedicated elements for CFI and CSS Selectors among other things) and I think it would be a step back to go back in that direction.

Also to answer another of your comment:

For example, the current proposal requires producer/consumer code to generate/parse ad-hoc fragment schemes for CSS Selectors and (partial)CFIs which lack precise definitions of escaping rules

I don't think that's true:

danielweck commented 5 years ago

precise definitions of escaping rules have absolutely nothing to do with fragments vs a specialized element

If the fragment syntax introduces string tokens with special significance (e.g. partialCfi(xxx), then both generator and parser code must account for them when handling the xxx value)

danielweck commented 5 years ago

we can provide helpers for this

Could you please explain what you mean by this? I am not sure I understand. Are you talking about code that generates/parses the special URI fragment scheme syntax? If so, that is my point exactly: this processing step should only be necessary at the point at which such URI fragment is necessary, for example to pass into a consumer agent such as a browser engine / web API that can interpret such full URL, constructed from its constituent parts which are expressed in exploded form inside the Locator payload (href + relevant "fragments").

mickael-menu-mantano commented 5 years ago

For the helpers, and taking JavaScript as an example, it could be to add functions/properties to the Locator JS protocol that parse behind the scene the string fragments and return the value as a true object, like you would need it on the platform. Using that, it shouldn't really matter how the information is stored in the JSON, the usage in the app should be the same.

I don't have a strong preference for any of the two proposed models, both would work pretty well at least on mobile. However, after pondering your arguments, I don't think that DOM range should be a "first class citizen". IMHO we should either have all the locations as objects, or everything as a string fragment with property helpers to parse it behind the scene.

HadrienGardeur commented 5 years ago

IMHO we should either have all the locations as objects, or everything as a string fragment with property helpers to parse it behind the scene.

+1 to that, consistency is always better.

"Everything as object" though would be harder to achieve IMO and would require:

These are all quite similar to how metadata are handled in RWPM.

The main difference compared to RWPM is that these locators will need to be stored in a DB most of the time, which means that each app will most likely have to serialize/deserialize JSON as a string.

I don't expect the "everything as object" route to require less work behind the scene, the outcome will be mostly similar with some additional complexity.

danielweck commented 5 years ago

Consistency is good. So, locatorInstance.locations.progression is a number/float value that represents a percentage offset into the resource. Should this also be a fragment in locatorInstance.locations.fragments?

Full example: locatorInstance.locations.fragments=[ "#paragraph-id", "cssSelector(body.topClass div:nth-child\\(4\\) p#paragraph-id)", "resourceProgression(0.9452)", "partialCfi(/4/2/8/6[paragraph-id])" ] Side note: pay attention to the escaped parentheses characters inside the string value of the cssSelector() fragment scheme ... I actually made that up, I do not know the exact escaping rules, and what the border cases are. This would have to be further defined, also for partialCfi().

Evidently; based on the overall state of the discussion, mainly between Mickael and Hadrien, and myself; my opinion falls into the minority bucket, so naturally I will follow the group consensus :) I do however maintain my objection, which I have tried to articulate in previous comments. Let me try to synthesize my thoughts once again:

In recent discussions we have been dealing with several tangential issues such as: DOMRange specificities (structured object), the overall unclear processing model for when several kinds of fragment resolutions are expressed in a single Locator payload (e.g. CSS selectors / fragment-ID vs. progression vs. CFI character-level position, etc.), and the notion of text (before/after/highlight). But here I am expressing reservations specifically about forcing roundtrip encoding/decoding steps (so-called API "helpers") because of the proposed fragments-as-array-of-string approach.

I do very much appreciate the need to strike a reasonable balance in the Locator model in terms of generalization vs. specialization. However, a an implementor I will be disappointed to follow a design pattern that seems to run counter to the otherwise pragmatic architectural principles that underpin most of the Readium2 models / JSON serialization.

If fragments is as an array of arbitrary strings that need to be parsed+unespaced in order extract their true meaning and potentially-structured values, and if in fact all known Locator consumers (iOS/Android/desktop/web R2 implementations) end-up invoking the suggested API "helpers" back-and-forth, then this is symptomatic that the proposed design is detached from real-world applications, and that the fragments array-of-string encoding format is unnecessary.

This approach not only introduces some degree of runtime overhead (i.e. in terms of performance and memory allocations), but there is also increased code complexity and testing costs due to the additional parsing / escaping rules. These downsides would be absolutely fine if the API consumers actually made use of the URI-fragment-friendly syntax, but none of the known implementations do. To borrow from the idiom "premature optimization": to me this feels like "premature encoding". If there is indeed a real, identified use-case for URI-friendlyness (like there is in the TOC href time Media Fragments, for example), then we have a solid justification for this type of encoding, and the proposed solution has merits beyond purist generalization vs. specialization considerations. But instead, all R2 implementation will end-up relying on expensive "helpers" / conversion functions.

For example, in the desktop implementation Locator objects are central to several critical features, and although the volume of JSON traffic during event messaging is not huge, the processing cycles are not trivial (i.e. it "all adds up"). The reading location is emmited at a low rate from the "navigator" component (i.e. at every detected user interaction with the rendered document), but handling bookmarks, highlights/annotations, search results, etc. represents a more significant execution trace to debug into. Right now, the cssSelector field is expressed "natively", and the string value can be consumed directly by the Web API document.querySelector(). With the proposed fragments model, the "helpers" layer of indirection will have to be implemented correctly (character escaping, border cases, etc.) both for generation and parsing purposes, and will have to be invoked in a roundtrip manner for every message passed (the parsed value/object can of course be cached, but JSON payloads are by nature transient across the messaging layer, so conversions actually occur every time). Effectively, a simple locatorInstance.locations.cssSelector will have to become (behind the scenes, through "helper" functions) a lookup routine that scans an array in order to look-ahead (parse + discover) the desired value type.

HadrienGardeur commented 5 years ago

Consistency is good. So, locatorInstance.locations.progression is a number/float value that represents a percentage offset into the resource. Should this also be a fragment in locatorInstance.locations.fragments?

Progression as a concept works across all the different types of publication that we've considered so far (EPUB, PDF, audiobooks, comics) and isn't defined in any specification (it would have been a good candidate for Media Fragments URI 1.0 IMO).

This is why having progression as a top-level location makes sense.

You can hardly make the same assessment for DOM ranges or CSS selectors, which are extremely specialized.

[...] the overall unclear processing model for when several kinds of fragment resolutions are expressed in a single Locator payload (e.g. CSS selectors / fragment-ID vs. progression vs. CFI character-level position, etc.), and the notion of text (before/after/highlight)

IMO this has nothing to do with the discussion about fragments, since you could have locations that contradict one another. For many reasons that we've listed before (we need context along with multiple fallback options), we can't live in the naïve world of "CFI only" anymore, therefore there is always a risk of having a location that contradicts either text or a different location.

However, a an implementor I will be disappointed to follow a design pattern that seems to run counter to the otherwise pragmatic architectural principles that underpin most of the Readium2 models / JSON serialization.

I don't think that's true. The current situation with fragments is IMO identical to what we have with rel in a Link Object: a rel or a fragment can only be understood in the context of a specific media type.

This approach not only introduces some degree of runtime overhead (i.e. in terms of performance and memory allocations), but there is also increased code complexity and testing costs due to the additional parsing / escaping rules.

As I've said before, I think that the "everything is an object" approach would be every bit as costly in terms of code complexity and parsing/escaping rules but it would also make the overall model more complex and would put the burden on maintaining a registry of locations entirely on us (vs the fragment approach where each media type is responsible for registering fragments).

if there is indeed a real, identified use-case for URI-friendlyness (like there is in the TOC href time Media Fragments, for example), then we have a solid justification for this type of encoding

In the context of Readium Web, there is: this could be a way to reference specific portions of a publication or jump to a location that doesn't have an id. It won't be as powerful as what we can do when passing a Locator to a navigator, but at least we'll be able to do something.

A Publication Viewer will be able to use the exact same API that we use to jump to a locator.

danielweck commented 5 years ago

I think that the "everything is an object" approach would be every bit as costly in terms of code complexity and parsing/escaping rules

Let's take a concrete example - locator is a Locator instance, here represented in its JSON serialization, for illustration purposes (additional JSON properties omitted, for brevity):

{
   "locations": {
      "cssSelector": "body.rootClass div:nth-child(2) p#para-id"
   }
}

Example Javascript consumer code running inside the web browser runtime that displays an HTML document (publication document):

const paraElement = document.querySelector(locator.locations.cssSelector);

Now with the fragments encoding (once again, the scheme and escaping rules are made-up for illustration purposes, not a solid proposal):

{
   "locations": {
      "fragments": [
         "cssSelector(body.rootClass div:nth-child\\(2\\) p#para-id)"
      ]
   }
}

Example crappy Javascript code (sanity checks removed, for brevity):

const cssSelector = helperExtractCssSelector(locator);
const paraElement = document.querySelector(cssSelector)
function helperExtractCssSelector(locator) {
   for (const fragment of locator.locations.fragments) {

      // the "fragmentIsCssSelector" piece of code returns true
      // if the fragment string matches the `cssSelector(xxx)` syntax
      if (fragmentIsCssSelector(fragment)) {

         // the "removeCssSelectorScheme" piece of code
         // removes the `cssSelector(xxx)` scheme in the string,
         // returns the `xxx` value which may contain escaped characters
         const escapedCssSelector = removeCssSelectorScheme(fragment);

         // the "unescapeCssSelector" piece of code ensures escaped characters
         // are un-escaped (e.g. double-backlashed parentheses for `:nth-child(yyy)` pseudo-class)
         // and returns the ready-to-use CSS Selector value
         return unescapeCssSelector(escapedCssSelector);
      }
   }
}

Same logic for CFI expressions.

HadrienGardeur commented 5 years ago

I've already replied to that example @danielweck, it's only relevant if we have a known quantity of locations, which is definitely not the case.

Readium Desktop seems to primarily use CSS Selectors and DOM Ranges (neither of which have a standardized representation as a fragment), but other implementations may use completely different locations (they can even be media specific, as we've seen with PDF, which is extremely useful).

In practice, we won't be able to use locator.locations.cssSelector without writing some helper code, no matter which direction we're taking (the object approach would need to inspect something like locator.locations.otherLocations and you've already illustrated the situation for fragments).

My recommendation in such cases would be to simply write something something specific to your own implementation (Readium Desktop), which is a bit of a shame but would be the only way to deal with the issue that you're facing. I don't believe that changing the model will solve the problem that you're facing.

danielweck commented 5 years ago

It is absolutely fine to have "helper" code that inspects objects for available fields, i.e. to test the presence of specific discoverable properties.

Conversely, "helper" code that parses arbitrary strings in order to match supported tokens and to parse out meaningful values, is neither trivial nor efficient.

danielweck commented 5 years ago

My recommendation in such cases would be to simply write something something specific to your own implementation (Readium Desktop), which is a bit of a shame but would be the only way to deal with the issue that you're facing.

If the iOS/Android implementation also have to produce/consume the flat linearized fragments syntax via "helpers" in order to generate/extract timestamps and PDF references, then the nature of the problem is not specific to R2 desktop/Electron/Chromium. I expect that one day iOS/Android implementation will need to process CFI locations too (or maybe even CSS Selectors, as this is a core web feature), in which case I fail to see how the unnecessary roundtrip conversions can be ignored. And of course, there is still the pending issue of encoding DOMRanges for highlights/annotations (object destructuring).

mickael-menu-mantano commented 5 years ago

We are discussing this subject right now on the call, are you available to join @danielweck ?

mickael-menu-mantano commented 5 years ago

Regarding the escaping, maybe we don't need to do it at all? This unofficial spec doesn't do it: http://simonstl.com/articles/cssFragID.html

css(div[class~="content"]:nth-child(2))

jccr commented 5 years ago

If I understand some of your points correctly, @danielweck

You don't want to have to build Classifiers/Declassifiers for each "raw" value you'll get. You don't want to have to execute these for each value that comes your way.

danielweck commented 5 years ago

"Unofficial Draft 02 March 2012" 'nough said ;)

mickael-menu-mantano commented 5 years ago

"Unofficial Draft 02 March 2012" 'nough said ;)

Quite a bad omen... 😄 But maybe still relevant if we go down this road.

HadrienGardeur commented 5 years ago

I fail to understand the counter-proposal though.

Let me try to figure where this might go:

danielweck commented 5 years ago

@jccr

"Classifiers/Declassifiers"

Why not :) I used the terms "Producers/Consumers" for the Locator objects (and locations/fragments contained therein), but I think we are both referring to the same class of technical considerations.

The information about reading-locations/bookmarks/search-results/highlights-annotations is transported in Locator objects (for example, wrapped in marshalled event payloads that transit across the boundaries of webviews and platform-specific application layers, e.g. via postMessage / IPC). Producing/consuming this data occurs at frequent intervals. The runtime performance hit is of course not crippling in the context of a typical reading system, but there is however an impact that programmers should not neglect when implementing Locator logic, especially inside the webview runtime that hosts publication documents. My main concern however is the added debugging complexity and error-prone-ness related to generating/parsing/escaping. If there is a consumer component that natively interprets "fragments":["cssSelector(body.rootClass div:nth-child\\(2\\) p#para-id)"] (e.g. a webview / web browser engine), then great, this is a really useful/powerful feature, and any potential tradeoffs are therefore justified. But instead, the Web API document.querySelector() natively supports body.rootClass div:nth-child(2) p#para-id, so the round-tripping is redundant and becomes detrimental.

danielweck commented 5 years ago

Regarding extensibility and discovery of the various "types" of locations / fragments, the elepphant in the room is https://www.w3.org/TR/annotation-model/#selectors so let's lay down a concrete example, for illustration / discussion purposes.

Here is a sample Locator object's JSON serialization (ouch, verbose and ugly, but follows the existing W3C spec.):

{
   "locations": {
      "progression": 0.8,
      "fragments": [
         {
            "type": "CssSelector",
            "value": "body.rootClass div:nth-child(2) p#paragraph-id"
         },
         {
            "type": "FragmentSelector",
            "conformsTo": "http://tools.ietf.org/rfc/rfc3236",
            "value": "paragraph-id"
         },
         {
            "type": "FragmentSelector",
            "conformsTo": "http://www.idpf.org/epub/linking/cfi/epub-cfi.html",
            "value": "/4/2/8/6[paragraph-id]"
         },
         {
            "type": "FragmentSelector",
            "conformsTo": "http://www.w3.org/TR/media-frags/",
            "value": "t=15"
         },
      ]
   }
}
danielweck commented 5 years ago

Instead, we could cherry-pick the CssSelector and use our own PartialCfi:

{
   "locations": {
      "progression": 0.8,
      "fragments": [
         {
            "type": "CssSelector",
            "value": "body.rootClass div:nth-child(2) p#paragraph-id"
         },
         "#paragraph-id",
         {
            "type": "PartialCfi",
            "value": "/4/2/8/6[paragraph-id]"
         },
         "t=15"
      ]
   }
}
danielweck commented 5 years ago

But, as mentioned on the conference call as a thought exercise (not an actual proposal), why encode Media Fragment URI timestamps with "t=15" (or possibly other valid clock syntaxes) when this could in fact be extremely simplified with a float / number of seconds (as done with duration in other ReadiumWebPubManifest cases). For example:

{
   "locations": {
      "progression": 0.8,
      "fragments": [
         {
            "type": "CssSelector",
            "value": "body.rootClass div:nth-child(2) p#paragraph-id"
         },
         "#paragraph-id",
         {
            "type": "PartialCfi",
            "value": "/4/2/8/6[paragraph-id]"
         },
         {
            "type": "ClockTimeStampSecondsWhatever",
            "value": 15
         }
      ]
   }
}

...this potentially opens a whole can of worms because the value property can now be different types, or even an object (e.g. the DOMRange start which is a tuple)

danielweck commented 5 years ago

I fail to understand the counter-proposal though.

I don't have one yet :)

Like I said, I am happy to go with the group consensus to use fragments as described now (albeit slightly underspecified with respect to the cssSelector() and partialCfi() schemes and escaping rules), with the caveat that implementations where Locator producers/consumers do not need URI fragments syntax will be very tempted to internally skip the fragments notation altogether in order to avoid the unnecessary overhead.

danielweck commented 5 years ago

Let me try to figure where this might go:

* are you suggesting to add `css` (CSS Selectors) as a string, `cfi` (Partial CFI) as a string and `dom` (DOM Ranges) as an object to `locations`

* keeping `fragments` for all fragments

* and adding an extensibility point in `locations` for other non-fragment extensions?

This makes logical sense, but I would like to hear from other implementors too. I think that we need to strike a healthy balance between generalization and specialization, and this would be a step in the right direction. Rationale:

Encoding information in a fragments fashion (i.e. https://www.w3.org/TR/media-frags/ ) is useful in a context where data needs to be linearized / flattened into a single piece of string that fits inside a URI (and naturally, where there is an expectation that such URI will be consumed/deconstructed at the receiving end). If implementation X needs to cover this use-case (e.g. a client/server web app that transports data via URL), then such implementation would certainly benefit from the Media Fragments approach.

However, I am under the impression that most R2 apps do in fact transmit information in structured form, so linearizing parts of it seems counter-productive. Therefore I like the idea of enabling a well-recognized set of constructs (like CSS Selectors and Partial CFIs ... perhaps even timestamp in seconds?), elevating them as first-class citizen.

jccr commented 5 years ago

linearizing parts of it seems counter-productive.

Having first class linearization plus haiving cssSelector( and partialCfi( accepted as a solution is very useful for a "standard" way of generating bookmark URIs in Daniel's use cases and mine. (@danielweck do you agree?)

danielweck commented 5 years ago

Purely from an abstract data model point of view, the generalized locator.locations.fragments array of URI-Media-Fragments strings is inherently extensible (i.e. without any additional JSON Schema and/or vocabulary registry trickery, just matching fragment-schemes with resource media-types).

Also, as suggested by Hadrien, there could be an extension point in locator.locations for adding specialized properties akin to the first-class locator.locations.cssSelector and locator.locations.partialCfi fields. For these particular two properties, the data type is string (values are ready-to-use, i.e. no character escaping other than JSON stuff, as there is no fragment scheme such as cssSelector(xxx) or ?cssSelector=xxx).

However, the data type could also be object, for example to encode the 3 distinct fields of locator.locations.domRangePos (start or end Locators): cssSelector (string) + index (zero-based integer) + offset (zero-based integer).

Personally, I would even push this reasoning as far as locator.locations.timeInSeconds (positive float number) to avoid unnecessary ?t=12.345 roundtrip string conversions of time Media Fragments (unless of course there is a concrete use within a given implementation for using the string t=12.345 as-is). However I realize that this "optimization" would be a step too far with respect to the fragments approach, which naturally promotes using the schemes defined in the Media Fragments specs.

So, in practice, I don't see a valid reason for an implementation to produce the equivalent of locator.locations.[cssSelector | partialCfi | domRangePos] (or the extended ones) into the generalized fragments array (either instead of the equivalent native first-class specialized properties, or in addition to). If indeed an implementation needs to transport this kind of information via a URI towards a consumer endpoint that supports a particular encoding scheme, then by all means such implementation should linearize/flatten the data using a suitable Media Fragment method, either ad-hoc or standardized (if any).

Thoughts?

HadrienGardeur commented 5 years ago

I'll rephrase things a bit differently:

This means that the core model will remain the same and those three will be covered as part of our extensibility.

danielweck commented 5 years ago

Closing. See: https://github.com/readium/architecture/issues/99