readium / readium-cfi-js

BSD 3-Clause "New" or "Revised" License
43 stars 45 forks source link

Support for OPF spine item indexing: initial CFI path prefix using even numbers for XML nodes (OPF), before indirection into DOM tree (XHTML) #12

Open danielweck opened 10 years ago

danielweck commented 10 years ago

Does this library support "full" CFI expressions starting in the OPF (spine items), or only "partial" CFI syntax (local to the DOM of a HTML document)? My impression is: the latter, not the former. In other words, it looks like some additional code is needed in readium-shared-js and/or readium-js to preserve some information at OPF / XML parsing time, so that each spine item can be associated with a well-defined CFI path "prefix". Otherwise, full CFI expressions cannot reliably be matched against particular XHTML documents.

As a reference, see C++ readium-sdk code: https://github.com/readium/readium-sdk/blob/develop/UnitTests/cfi_tests.cpp https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/cfi.h

More specifically, see how CFISubpathForManifestItemWithID() uses an index value stored at XML parsing time to remember the spine "location" in the OPF (_spineCFIIndex): https://github.com/readium/readium-sdk/blob/develop/ePub3/ePub/package.cpp#L151 (note that the IndexOfSpineItemWithIDRef() function simply computes the list index and maps it to an even number, which is trivial)

However, note that a typical full CFI path "prefix" is

/6/n[ID]!

where 6 is the even index (XML node position) expected as per the OPF XML schema, and n is the spine item index normalised into an even number ((i+1)*2), and ID is the mandatory spine item idref (again, as per the OPF specification).

So, one could rightfully argue that from an implementation perspective, only n and ID need to be computed / preserved.

danielweck commented 10 years ago

Follow-up: there should probably be an API to convert a full CFI to a combination of: corresponding spine/manifest item (real object pointer in the readium-shared-js data model, or a zero-based list index), plus a partial HTML-local DOM-specific CFI. Conversely, there should be an API to do the reverse operation: obtain the CFI "prefix" for a given spine/manifest item, so that it can be concatenated with the HTML-local CFI (thereby resulting in a full CFI). This way, systems that require EPUB-wide CFI comparison/sorting (e.g. annotation, bookmarks, etc.) can rely on the "canonical" properties of CFI expressions, without resorting to Readium-specific tricks such as using a "fake" CFI prefix.

See this helper function: https://github.com/readium/readium-shared-js/blob/develop/js/views/internal_links_support.js#L30

danielweck commented 10 years ago

Now, example of the impact on the "deep linking" implementation:

ReadiumSDK.Views.InternalLinksSupport.processDeepLink() https://github.com/readium/readium-shared-js/blob/develop/js/views/internal_links_support.js#L90

EPUBcfi.Interpreter.getContentDocHref() https://github.com/readium/readium-cfi-js/blob/master/src/models/cfi_interpreter.js#L27

Note how each time a "deep" link is activated by the user: the OPF XML is ajax'ed, XML-parsed, DOM-traversed, ... only to obtain the manifest item's href (via its referencing spine item).

This seems overkill, and could probably be improved?

dmitrym0 commented 10 years ago

Hey Daniel,

I think you've pretty much answered your initial question. The CFI library does support full CFIs however it requires access to the manifest. Unfortunately a lot of the functionality relies on jQuery and having access to the full DOM of the OPF. This could be improved by moving away from direct DOM manipulation and representing the manifest as some sort of an intermediate construct. At the time we opted to use partial CFIs, but my preference has always been for the CFIs as specified in the spec.

danielweck commented 10 years ago

Thank you @dmitrym0 (and sorry for the rather verbose self-discussion thread, that's why I marked the issue as "question" :) )

danielweck commented 9 years ago

Just a quick heads-up regarding client-side usage of the 99! fake OPF spine item index + indirection, for example in the document selection highlighting feature, in the “annotation module” (search for arbitraryPackageDocCFI):

https://github.com/readium/readium-shared-js/blob/develop/lib/annotations_module.js

At its core, the CFI library should support comparison/sorting of “full” CFI references. This functionality is necessary for: TOC context matching, bookmark/annotation ordering, multiple rendition mapping, etc.

danielweck commented 8 years ago

Heads-up: So far, the lack of support for "full" CFI expressions (i.e. XML index of OPF spine-item + path inside HTML document) has not been a blocking problem (this issue is >1 years old!). Readium internally uses a combination of spine-item IDREF (i.e. ID of manifest item) and "partial" CFI rooted at the (X)HTML EPUB3 Content Documents. This two-field object (JSON arbitrary syntax) is used to store the reading "bookmark" / last opened location, which is fine as interoperability is not a requirement. There is some preliminary (experimental) code that computes full CFI expressions in order to match locations across EPUB3 Multiple Renditions.

Search for "itemRefIndex" in: https://github.com/readium/readium-js/compare/develop...feature/epub3MultipleRenditions#diff-cee2e2bcb45c3d563b9226a65df3a819R198

There's also some code in shared-js to compare CFIs (sortability), and to map locators across OPF renditions: https://github.com/readium/readium-shared-js/compare/feature/epub3MultipleRenditions#diff-629a5e2ad23a45424177cc6c5d2468bbR79