Closed danielweck closed 5 years ago
If anything, this group has shown that there's a total lack of consistency in which fragment will be used by various apps.
If we had common agreement on this issue, I would be open to that argument, but I don't think that's the case at all.
I'll also quote a comment I've made in the other issue:
Overall, I'm not in favor of extending our current model with anything too specialized, I think the mix of
position
,progression
,overallProgression
andfragments
is good enough:
- it's as media-generic as we can be
- it provides both context and the ability to express very specific locations
- there's built-in extensibility through fragments (that can be defined per media type)
We had something more specialized elements before (with dedicated elements for CFI and CSS Selectors among other things) and I think it would be a step back to go back in that direction.
Also to answer another of your comment:
For example, the current proposal requires producer/consumer code to generate/parse ad-hoc fragment schemes for CSS Selectors and (partial)CFIs which lack precise definitions of escaping rules
I don't think that's true:
fragments
vs a specialized element, let's address this concern separatelyprecise definitions of escaping rules have absolutely nothing to do with fragments vs a specialized element
If the fragment syntax introduces string tokens with special significance (e.g. partialCfi(xxx)
, then both generator and parser code must account for them when handling the xxx
value)
we can provide helpers for this
Could you please explain what you mean by this? I am not sure I understand.
Are you talking about code that generates/parses the special URI fragment scheme syntax? If so, that is my point exactly: this processing step should only be necessary at the point at which such URI fragment is necessary, for example to pass into a consumer agent such as a browser engine / web API that can interpret such full URL, constructed from its constituent parts which are expressed in exploded form inside the Locator payload (href
+ relevant "fragments").
For the helpers, and taking JavaScript as an example, it could be to add functions/properties to the Locator
JS protocol
that parse behind the scene the string fragments and return the value as a true object, like you would need it on the platform.
Using that, it shouldn't really matter how the information is stored in the JSON, the usage in the app should be the same.
I don't have a strong preference for any of the two proposed models, both would work pretty well at least on mobile. However, after pondering your arguments, I don't think that DOM range should be a "first class citizen". IMHO we should either have all the locations as objects, or everything as a string fragment with property helpers to parse it behind the scene.
IMHO we should either have all the locations as objects, or everything as a string fragment with property helpers to parse it behind the scene.
+1 to that, consistency is always better.
"Everything as object" though would be harder to achieve IMO and would require:
These are all quite similar to how metadata are handled in RWPM.
The main difference compared to RWPM is that these locators will need to be stored in a DB most of the time, which means that each app will most likely have to serialize/deserialize JSON as a string.
I don't expect the "everything as object" route to require less work behind the scene, the outcome will be mostly similar with some additional complexity.
Consistency is good. So, locatorInstance.locations.progression
is a number/float value that represents a percentage offset into the resource. Should this also be a fragment in locatorInstance.locations.fragments
?
Full example: locatorInstance.locations.fragments=[ "#paragraph-id", "cssSelector(body.topClass div:nth-child\\(4\\) p#paragraph-id)", "resourceProgression(0.9452)", "partialCfi(/4/2/8/6[paragraph-id])" ]
Side note: pay attention to the escaped parentheses characters inside the string value of the cssSelector()
fragment scheme ... I actually made that up, I do not know the exact escaping rules, and what the border cases are. This would have to be further defined, also for partialCfi()
.
Evidently; based on the overall state of the discussion, mainly between Mickael and Hadrien, and myself; my opinion falls into the minority bucket, so naturally I will follow the group consensus :) I do however maintain my objection, which I have tried to articulate in previous comments. Let me try to synthesize my thoughts once again:
In recent discussions we have been dealing with several tangential issues such as: DOMRange specificities (structured object), the overall unclear processing model for when several kinds of fragment resolutions are expressed in a single Locator payload (e.g. CSS selectors / fragment-ID vs. progression vs. CFI character-level position, etc.), and the notion of text
(before/after/highlight). But here I am expressing reservations specifically about forcing roundtrip encoding/decoding steps (so-called API "helpers") because of the proposed fragments-as-array-of-string approach.
I do very much appreciate the need to strike a reasonable balance in the Locator model in terms of generalization vs. specialization. However, a an implementor I will be disappointed to follow a design pattern that seems to run counter to the otherwise pragmatic architectural principles that underpin most of the Readium2 models / JSON serialization.
If fragments
is as an array of arbitrary strings that need to be parsed+unespaced in order extract their true meaning and potentially-structured values, and if in fact all known Locator consumers (iOS/Android/desktop/web R2 implementations) end-up invoking the suggested API "helpers" back-and-forth, then this is symptomatic that the proposed design is detached from real-world applications, and that the fragments
array-of-string encoding format is unnecessary.
This approach not only introduces some degree of runtime overhead (i.e. in terms of performance and memory allocations), but there is also increased code complexity and testing costs due to the additional parsing / escaping rules. These downsides would be absolutely fine if the API consumers actually made use of the URI-fragment-friendly syntax, but none of the known implementations do. To borrow from the idiom "premature optimization": to me this feels like "premature encoding". If there is indeed a real, identified use-case for URI-friendlyness (like there is in the TOC href
time Media Fragments, for example), then we have a solid justification for this type of encoding, and the proposed solution has merits beyond purist generalization vs. specialization considerations. But instead, all R2 implementation will end-up relying on expensive "helpers" / conversion functions.
For example, in the desktop implementation Locator objects are central to several critical features, and although the volume of JSON traffic during event messaging is not huge, the processing cycles are not trivial (i.e. it "all adds up"). The reading location is emmited at a low rate from the "navigator" component (i.e. at every detected user interaction with the rendered document), but handling bookmarks, highlights/annotations, search results, etc. represents a more significant execution trace to debug into. Right now, the cssSelector
field is expressed "natively", and the string value can be consumed directly by the Web API document.querySelector()
. With the proposed fragments
model, the "helpers" layer of indirection will have to be implemented correctly (character escaping, border cases, etc.) both for generation and parsing purposes, and will have to be invoked in a roundtrip manner for every message passed (the parsed value/object can of course be cached, but JSON payloads are by nature transient across the messaging layer, so conversions actually occur every time). Effectively, a simple locatorInstance.locations.cssSelector
will have to become (behind the scenes, through "helper" functions) a lookup routine that scans an array in order to look-ahead (parse + discover) the desired value type.
Consistency is good. So,
locatorInstance.locations.progression
is a number/float value that represents a percentage offset into the resource. Should this also be a fragment inlocatorInstance.locations.fragments
?
Progression as a concept works across all the different types of publication that we've considered so far (EPUB, PDF, audiobooks, comics) and isn't defined in any specification (it would have been a good candidate for Media Fragments URI 1.0 IMO).
This is why having progression
as a top-level location makes sense.
You can hardly make the same assessment for DOM ranges or CSS selectors, which are extremely specialized.
[...] the overall unclear processing model for when several kinds of fragment resolutions are expressed in a single Locator payload (e.g. CSS selectors / fragment-ID vs. progression vs. CFI character-level position, etc.), and the notion of text (before/after/highlight)
IMO this has nothing to do with the discussion about fragments, since you could have locations that contradict one another. For many reasons that we've listed before (we need context along with multiple fallback options), we can't live in the naïve world of "CFI only" anymore, therefore there is always a risk of having a location that contradicts either text
or a different location.
However, a an implementor I will be disappointed to follow a design pattern that seems to run counter to the otherwise pragmatic architectural principles that underpin most of the Readium2 models / JSON serialization.
I don't think that's true. The current situation with fragments
is IMO identical to what we have with rel
in a Link Object: a rel or a fragment can only be understood in the context of a specific media type.
This approach not only introduces some degree of runtime overhead (i.e. in terms of performance and memory allocations), but there is also increased code complexity and testing costs due to the additional parsing / escaping rules.
As I've said before, I think that the "everything is an object" approach would be every bit as costly in terms of code complexity and parsing/escaping rules but it would also make the overall model more complex and would put the burden on maintaining a registry of locations entirely on us (vs the fragment approach where each media type is responsible for registering fragments).
if there is indeed a real, identified use-case for URI-friendlyness (like there is in the TOC href time Media Fragments, for example), then we have a solid justification for this type of encoding
In the context of Readium Web, there is: this could be a way to reference specific portions of a publication or jump to a location that doesn't have an id. It won't be as powerful as what we can do when passing a Locator to a navigator, but at least we'll be able to do something.
A Publication Viewer will be able to use the exact same API that we use to jump to a locator.
I think that the "everything is an object" approach would be every bit as costly in terms of code complexity and parsing/escaping rules
Let's take a concrete example - locator
is a Locator instance, here represented in its JSON serialization, for illustration purposes (additional JSON properties omitted, for brevity):
{
"locations": {
"cssSelector": "body.rootClass div:nth-child(2) p#para-id"
}
}
Example Javascript consumer code running inside the web browser runtime that displays an HTML document (publication document):
const paraElement = document.querySelector(locator.locations.cssSelector);
Now with the fragments
encoding (once again, the scheme and escaping rules are made-up for illustration purposes, not a solid proposal):
{
"locations": {
"fragments": [
"cssSelector(body.rootClass div:nth-child\\(2\\) p#para-id)"
]
}
}
Example crappy Javascript code (sanity checks removed, for brevity):
const cssSelector = helperExtractCssSelector(locator);
const paraElement = document.querySelector(cssSelector)
function helperExtractCssSelector(locator) {
for (const fragment of locator.locations.fragments) {
// the "fragmentIsCssSelector" piece of code returns true
// if the fragment string matches the `cssSelector(xxx)` syntax
if (fragmentIsCssSelector(fragment)) {
// the "removeCssSelectorScheme" piece of code
// removes the `cssSelector(xxx)` scheme in the string,
// returns the `xxx` value which may contain escaped characters
const escapedCssSelector = removeCssSelectorScheme(fragment);
// the "unescapeCssSelector" piece of code ensures escaped characters
// are un-escaped (e.g. double-backlashed parentheses for `:nth-child(yyy)` pseudo-class)
// and returns the ready-to-use CSS Selector value
return unescapeCssSelector(escapedCssSelector);
}
}
}
Same logic for CFI expressions.
I've already replied to that example @danielweck, it's only relevant if we have a known quantity of locations, which is definitely not the case.
Readium Desktop seems to primarily use CSS Selectors and DOM Ranges (neither of which have a standardized representation as a fragment), but other implementations may use completely different locations (they can even be media specific, as we've seen with PDF, which is extremely useful).
In practice, we won't be able to use locator.locations.cssSelector
without writing some helper code, no matter which direction we're taking (the object approach would need to inspect something like locator.locations.otherLocations
and you've already illustrated the situation for fragments).
My recommendation in such cases would be to simply write something something specific to your own implementation (Readium Desktop), which is a bit of a shame but would be the only way to deal with the issue that you're facing. I don't believe that changing the model will solve the problem that you're facing.
It is absolutely fine to have "helper" code that inspects objects for available fields, i.e. to test the presence of specific discoverable properties.
Conversely, "helper" code that parses arbitrary strings in order to match supported tokens and to parse out meaningful values, is neither trivial nor efficient.
My recommendation in such cases would be to simply write something something specific to your own implementation (Readium Desktop), which is a bit of a shame but would be the only way to deal with the issue that you're facing.
If the iOS/Android implementation also have to produce/consume the flat linearized fragments
syntax via "helpers" in order to generate/extract timestamps and PDF references, then the nature of the problem is not specific to R2 desktop/Electron/Chromium. I expect that one day iOS/Android implementation will need to process CFI locations too (or maybe even CSS Selectors, as this is a core web feature), in which case I fail to see how the unnecessary roundtrip conversions can be ignored. And of course, there is still the pending issue of encoding DOMRanges for highlights/annotations (object destructuring).
We are discussing this subject right now on the call, are you available to join @danielweck ?
Regarding the escaping, maybe we don't need to do it at all? This unofficial spec doesn't do it: http://simonstl.com/articles/cssFragID.html
css(div[class~="content"]:nth-child(2))
If I understand some of your points correctly, @danielweck
You don't want to have to build Classifiers/Declassifiers for each "raw" value you'll get. You don't want to have to execute these for each value that comes your way.
"Unofficial Draft 02 March 2012" 'nough said ;)
"Unofficial Draft 02 March 2012" 'nough said ;)
Quite a bad omen... 😄 But maybe still relevant if we go down this road.
I fail to understand the counter-proposal though.
Let me try to figure where this might go:
css
(CSS Selectors) as a string, cfi
(Partial CFI) as a string and dom
(DOM Ranges) as an object to locations
fragments
for all fragmentslocations
for other non-fragment extensions?@jccr
"Classifiers/Declassifiers"
Why not :) I used the terms "Producers/Consumers" for the Locator objects (and locations/fragments contained therein), but I think we are both referring to the same class of technical considerations.
The information about reading-locations/bookmarks/search-results/highlights-annotations is transported in Locator objects (for example, wrapped in marshalled event payloads that transit across the boundaries of webviews and platform-specific application layers, e.g. via postMessage / IPC).
Producing/consuming this data occurs at frequent intervals. The runtime performance hit is of course not crippling in the context of a typical reading system, but there is however an impact that programmers should not neglect when implementing Locator logic, especially inside the webview runtime that hosts publication documents.
My main concern however is the added debugging complexity and error-prone-ness related to generating/parsing/escaping. If there is a consumer component that natively interprets "fragments":["cssSelector(body.rootClass div:nth-child\\(2\\) p#para-id)"]
(e.g. a webview / web browser engine), then great, this is a really useful/powerful feature, and any potential tradeoffs are therefore justified. But instead, the Web API document.querySelector()
natively supports body.rootClass div:nth-child(2) p#para-id
, so the round-tripping is redundant and becomes detrimental.
Regarding extensibility and discovery of the various "types" of locations / fragments, the elepphant in the room is https://www.w3.org/TR/annotation-model/#selectors so let's lay down a concrete example, for illustration / discussion purposes.
Here is a sample Locator object's JSON serialization (ouch, verbose and ugly, but follows the existing W3C spec.):
{
"locations": {
"progression": 0.8,
"fragments": [
{
"type": "CssSelector",
"value": "body.rootClass div:nth-child(2) p#paragraph-id"
},
{
"type": "FragmentSelector",
"conformsTo": "http://tools.ietf.org/rfc/rfc3236",
"value": "paragraph-id"
},
{
"type": "FragmentSelector",
"conformsTo": "http://www.idpf.org/epub/linking/cfi/epub-cfi.html",
"value": "/4/2/8/6[paragraph-id]"
},
{
"type": "FragmentSelector",
"conformsTo": "http://www.w3.org/TR/media-frags/",
"value": "t=15"
},
]
}
}
Instead, we could cherry-pick the CssSelector
and use our own PartialCfi
:
{
"locations": {
"progression": 0.8,
"fragments": [
{
"type": "CssSelector",
"value": "body.rootClass div:nth-child(2) p#paragraph-id"
},
"#paragraph-id",
{
"type": "PartialCfi",
"value": "/4/2/8/6[paragraph-id]"
},
"t=15"
]
}
}
But, as mentioned on the conference call as a thought exercise (not an actual proposal), why encode Media Fragment URI timestamps with "t=15"
(or possibly other valid clock syntaxes) when this could in fact be extremely simplified with a float / number of seconds (as done with duration
in other ReadiumWebPubManifest cases). For example:
{
"locations": {
"progression": 0.8,
"fragments": [
{
"type": "CssSelector",
"value": "body.rootClass div:nth-child(2) p#paragraph-id"
},
"#paragraph-id",
{
"type": "PartialCfi",
"value": "/4/2/8/6[paragraph-id]"
},
{
"type": "ClockTimeStampSecondsWhatever",
"value": 15
}
]
}
}
...this potentially opens a whole can of worms because the value
property can now be different types, or even an object (e.g. the DOMRange start
which is a tuple)
I fail to understand the counter-proposal though.
I don't have one yet :)
Like I said, I am happy to go with the group consensus to use fragments
as described now (albeit slightly underspecified with respect to the cssSelector()
and partialCfi()
schemes and escaping rules), with the caveat that implementations where Locator producers/consumers do not need URI fragments syntax will be very tempted to internally skip the fragments
notation altogether in order to avoid the unnecessary overhead.
Let me try to figure where this might go:
* are you suggesting to add `css` (CSS Selectors) as a string, `cfi` (Partial CFI) as a string and `dom` (DOM Ranges) as an object to `locations` * keeping `fragments` for all fragments * and adding an extensibility point in `locations` for other non-fragment extensions?
This makes logical sense, but I would like to hear from other implementors too. I think that we need to strike a healthy balance between generalization and specialization, and this would be a step in the right direction. Rationale:
Encoding information in a fragments
fashion (i.e. https://www.w3.org/TR/media-frags/ ) is useful in a context where data needs to be linearized / flattened into a single piece of string that fits inside a URI (and naturally, where there is an expectation that such URI will be consumed/deconstructed at the receiving end). If implementation X needs to cover this use-case (e.g. a client/server web app that transports data via URL), then such implementation would certainly benefit from the Media Fragments approach.
However, I am under the impression that most R2 apps do in fact transmit information in structured form, so linearizing parts of it seems counter-productive. Therefore I like the idea of enabling a well-recognized set of constructs (like CSS Selectors and Partial CFIs ... perhaps even timestamp in seconds?), elevating them as first-class citizen.
linearizing parts of it seems counter-productive.
Having first class linearization plus haiving cssSelector(
and partialCfi(
accepted as a solution is very useful for a "standard" way of generating bookmark URIs in Daniel's use cases and mine. (@danielweck do you agree?)
Purely from an abstract data model point of view, the generalized locator.locations.fragments
array of URI-Media-Fragments strings is inherently extensible (i.e. without any additional JSON Schema and/or vocabulary registry trickery, just matching fragment-schemes with resource media-types).
Also, as suggested by Hadrien, there could be an extension point in locator.locations
for adding specialized properties akin to the first-class locator.locations.cssSelector
and locator.locations.partialCfi
fields. For these particular two properties, the data type is string (values are ready-to-use, i.e. no character escaping other than JSON stuff, as there is no fragment scheme such as cssSelector(xxx)
or ?cssSelector=xxx
).
However, the data type could also be object, for example to encode the 3 distinct fields of locator.locations.domRangePos
(start
or end
Locators): cssSelector
(string) + index
(zero-based integer) + offset
(zero-based integer).
Personally, I would even push this reasoning as far as locator.locations.timeInSeconds
(positive float number) to avoid unnecessary ?t=12.345
roundtrip string conversions of time Media Fragments (unless of course there is a concrete use within a given implementation for using the string t=12.345
as-is). However I realize that this "optimization" would be a step too far with respect to the fragments
approach, which naturally promotes using the schemes defined in the Media Fragments specs.
So, in practice, I don't see a valid reason for an implementation to produce the equivalent of locator.locations.[cssSelector | partialCfi | domRangePos]
(or the extended ones) into the generalized fragments
array (either instead of the equivalent native first-class specialized properties, or in addition to). If indeed an implementation needs to transport this kind of information via a URI towards a consumer endpoint that supports a particular encoding scheme, then by all means such implementation should linearize/flatten the data using a suitable Media Fragment method, either ad-hoc or standardized (if any).
Thoughts?
I'll rephrase things a bit differently:
fragments
locations
also provide an extensibility point for locations that are neither covered by our core model or by fragmentsThis means that the core model will remain the same and those three will be covered as part of our extensibility.
Closing. See: https://github.com/readium/architecture/issues/99
I have expressed reservations (see https://github.com/readium/architecture/issues/95 ) about the suboptimal processing model proposed for the
locations.fragments
array of string values (see https://github.com/readium/architecture/blob/master/locators/README.md#fragments ).For example, the current proposal requires producer/consumer code to generate/parse ad-hoc fragment schemes for CSS Selectors and (partial)CFIs which lack precise definitions of escaping rules (see for example the (full)CFI border cases https://www.idpf.org/epub/linking/cfi/epub-cfi.html#sec-epubcfi-escaping ).
Consumer code must parse each string value in the
locations.fragments
array in order to extract meaningful information (i.e. to differentiate CFI from CSS Selectors, and even time Media Fragments, etc.). This introspection step seems unnecessary and costly, in the context of R2 implementations which ultimately do not rely on native browser engine support for the intended URI fragments. Even time Media Fragments will be deconstructed in order to extract timestamps and feed those values to underlying playback APIs.Such "fragments" in their linearized / flattened form are certainly useful at the point at which the information gets encoded into actual URI fragments (which usually involves syntactic escaping that can be quite error-prone, see for example encodeURI vs. encodeURIComponent), for instance in
src
orhref
properties with JSON Schematype
=string
andformat
=uri-reference
. However, I fail to see the advantages of prematurely serializing high-level constructs such as CSS Selectors and (partial)CFI into a format compatible with URI fragments, when consumers in the R2 architecture are likely to make use of them directly (e.g. CSS Selectordocument.querySelector()
API).Finally, in light of the DOMRange discussion (see https://github.com/readium/architecture/issues/95 ) which suggests a "first class citizen" JSON property in order to express the DOMRange construct in a structured manner (i.e. non-flattened/linearized for the purpose of URI encoding), I am wondering where the line in the sand lies between "locator/fragment" constructs that belong in the generalized model (i.e. array of generic strings that need to be parsed in order to extract meaning, or otherwise collated together into a URI fragment), versus other constructs that warrant a directly-identifiable / discoverable data type (e.g.
locations.cssSelector="body div#id.class p#paragraph"
+locations.partialCfi="/4/6/2[paragraph]"
+locations.fragments=["#paragraph"]
instead oflocations.fragments=["#paragraph", "partialcfi(/4/6/2[paragraph])", "cssselector(body div#id.class p#paragraph)"]
).