readium / architecture

📚 Documents the architecture of the Readium projects
https://readium.org/architecture/
BSD 3-Clause "New" or "Revised" License
175 stars 33 forks source link

Calculating the Publication.positionList #101

Closed mickael-menu-mantano closed 3 years ago

mickael-menu-mantano commented 5 years ago

We need to specify for each format:

Related issue: Total progression in a publication for locators

CBZ and PDF

Those formats are straightforward, we can read directly the number of pages for PDF and files for CBZ to build the positionList. To retrieve the current position, we just need the index of the page.

PDF can be a bit less efficient because we need to open the file (potentially load it entirely in memory, eg. with Swift) to read its number of pages.

LCPDF

LCPDF contains encrypted PDF. So we can't really get the positionList until the license is unlocked. It might also contain several PDF, which is not very efficient if we have to open all of them to calculate the positionList.

An alternative would be to have the number of pages as a link property for each resource in the RWPM, then it's really efficient to build the positionList and doesn't require the publication's passphrase.

The positionList is built by adding the number of pages of each PDF in the readingOrder. Here's an example implementation in Swift: https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L50

We can find out the current position easily by keeping a separate array of positions for each resource href, and using the page index of the currently visible resource (eg. https://github.com/readium/r2-navigator-swift/blob/839e0c4900a84b9e337e7a3d836f0b78c7d9c28b/r2-navigator-swift/PDF/PDFNavigatorViewController.swift#L221).

EPUB

The tricky part that needs to be discussed...

How to create the positionList?

Among the solutions discussed to split a resource into pages:

Both the characters and bytes methods are pretty reliable to express the relative size between reading order resources and publications, as long as the chapters are not image based.

How to find out reliably the currently displayed position from positionList?

I think we agreed on a call that there's no way to accurately find the current position in an EPUB. The DOM displayed in a web view is dynamic and might not be equivalent to the one parsed from the static XHTML files. We can however approximate it:

Fixed layout vs reflowable

There's the added difficulty that an EPUB can contain both fixed layout and reflowable resources. Fixed layout is straightforward, one resource = one page. But we need to take it into account when calculating the positionList instead of only splitting by characters/bytes.

Side discussion

Calculating the positionList might be slow and memory/CPU-intensive (eg. for LCPDF we have to load all the PDFs in memory). I don't think that it's necessary to expose an asynchronous API for Publication.positionList. The caller can wrap it in a background process if it doesn't need the positionList synchronously.

However, we could benefit from having a cache in the streamer to store the calculated positionList (eg. as JSON).

mickael-menu-mantano commented 5 years ago

After today's call, the group consensus is:

mickael-menu-mantano commented 5 years ago

I implemented the solution for Swift: https://github.com/readium/r2-shared-swift/pull/66 https://github.com/readium/r2-streamer-swift/pull/118 https://github.com/readium/r2-navigator-swift/pull/65

This adds two properties to Publication: positionList and positionListFactory. The factory is a closure provided by the parsers to be able to build the positionList lazily.

positionListFactory is overwritable by the host app if needed. For example, using a different closure, a host app can implement a cache system by pulling the cached positionList from a database.

Regarding the EPUB implementation, I used a size of 3500 bytes for splitting a resource into a number of positions. This amounts to about one page of text on an iPad. Of course, this value needs to be the same on every platform, so we might need to experiment a bit to find the sweet spot.

llemeurfr commented 4 years ago

Note that the size of a "page" for this calculation should better be compatible with the notion of "printed page" in LCP. The LCP spec contains:

"For the print right, a page is defined as follows: The page as defined in the Publication, if it is pre-paginated (fixed layout) OR The page as defined by the page-list nav element of the EPUB Navigation Document, if this exists OR 1024 Unicode characters for all other cases"

... 1024 Unicode characters (not bytes).

Two solutions:

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

mickael-menu commented 4 years ago

... 1024 Unicode characters (not bytes).

Since we already agreed that the positionList would be an approximation – because we don't have an easy way to map the current unicode character, or byte for that matter in the webview – I think that keeping bytes is the best approach. It's how the RMSDK used to calculate the page numbers and it's much more efficient to create the positionList (right now, we don't have a cache). Using the resources' content would require opening all of them – with decryption if needed – upon Publication parsing.

Retrieving the current position in the web view is particularly difficult for dynamic books, because the rendered DOM might not be the same as the one in the Container that we use to construct the positionList.

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

I agree that it would be better to reflect the actual page-list, unless we recognize that positions are not in practice equivalent to pages.

There're some technical difficulties in retrieving the current position in the web view if we would use page-list though, because they can point to particular DOM elements, so we need to figure out which one it is we're in.

mickael-menu commented 4 years ago

Is there any discussion or specification on how to calculate the positionList for audiobooks yet? Does it make any sense to have it at all?

cc @HadrienGardeur

HadrienGardeur commented 4 years ago

Is there any discussion or specification on how to calculate the positionList for audiobooks yet? Does it make any sense to have it at all?

I don't think that it makes sense to calculate positions for audiobooks.

We can use the temporal media fragments and progression/totalProgression, that's enough.

llemeurfr commented 4 years ago

I we agree on 1024 bytes as the distance between two positions, we'd better update the LCP spec quickly (both the Readium spec and the ISO draft).

Please add a thumb up if you agree with 1024 bytes.

llemeurfr commented 4 years ago

I agree that it would be better to reflect the actual page-list, unless we recognize that positions are not in practice equivalent to pages.

It seems we all agree that positions are defined as a an approximation of the notion of page, when this notion is not expressed clearly in an ebook. Please comment if you disagree.

There're some technical difficulties in retrieving the current position in the web view if we would use page-list though, because they can point to particular DOM elements, so we need to figure out which one it is we're in.

I didn't think about this one. But do we need to compute a current position? we need the current locator, which can be expressed with sufficient precision as a progression (plus a specific DOM related measure in the case of Readium Desktop).

HadrienGardeur commented 4 years ago

I we agree on 1024 bytes as the distance between two positions, we'd better update the LCP spec quickly (both the Readium spec and the ISO draft).

Please add a thumb up if you agree with 1024 bytes.

We've already agreed on this during a call but I've added a 👍anyway.

Note also that if a page-list is provided with an EPUB file, Publication.positionList should IMO reflect it, with no need for computed values.

I disagree about that statement, I think that in every EPUB we absolutely need to compute a position list.

A page-list is not the equivalent of a position list:

mickael-menu commented 4 years ago

I didn't think about this one. But do we need to compute a current position? we need the current locator, which can be expressed with sufficient precision as a progression (plus a specific DOM related measure in the case of Readium Desktop).

IMO not providing a position (hence a positionList) consistently will break the API because host apps won't be able to rely on a consistent interface, for example to present a page scroller.

I don't think that it makes sense to calculate positions for audiobooks. We can use the temporal media fragments and progression/totalProgression, that's enough.

While I agree that it makes less sense for audiobooks, I think it's still worthwhile to generate a positionList. For the same reason I mentioned above, we want to provide a consistent API so that the host apps can rely on it with generic uses. For example, a page scroller could be used to scroll through an audiobook as well.

llemeurfr commented 4 years ago

A page-list is not the equivalent of a position list:

it's based on strings and not integers, which makes it complicated to provide an affordance for jumping to them

that is right.

pages in a page list can be spread very far apart from one another, which would not provide a usable reference that users can share between them pages in a page list can completely skip resources in the reading order, which would make them impossible to reference between users

Reading this part, I'm wondering why page lists would be explicitly added by publishers if they don't allow users to share and reference them in a proper way.

If there is a notion of position list on one side and a notion of page list on the other side, I don't see how we can design a good UX in reading apps. We are told that exposing page lists like we expose ToC is bad, and that they should be accessed via "go to" actions ... like position lists.

HadrienGardeur commented 4 years ago

We are told that exposing page lists like we expose ToC is bad, and that they should be accessed via "go to" actions ... like position lists.

With strings, the only way we can expose a page-list is like a ToC. You can't build an affordance with a text field, that would be a usability nightmare.

mickael-menu commented 4 years ago

Here are the PRs adding Publication.positionList for Kotlin:

llemeurfr commented 4 years ago

You can't build an affordance with a text field, that would be a usability nightmare.

A best practice should be discussed among app developers. In Thorium, we have planned a "go to page x" affordance and need to map it to the proper locator. If there is a page-list in the publication, this seems the proper list to use. If there is none, the position-list seems to be the proper fallback. Having a "go to" plugged to positions and an additional ugly screen of page numbers is a bad solution.

HadrienGardeur commented 4 years ago

In Thorium, we have planned a "go to page x" affordance and need to map it to the proper locator.

Which type of field do you plan on using for that affordance ?

danielweck commented 4 years ago

The current "go to page" affordance in Thorium is a simple text field where users type-in an arbitrary single-line string of characters. This is sufficient to meet accessibility requirements for the classroom scenario: "teacher asks students to open page '45' (or 'IX' in Roman numerals) in their printed publications, or in the digital equivalent provided by the EPUB3 @epub:type=pagebreak mechanism". Obviously, this is a "naive" string match, based on a string of characters input, with minimal cleanup/normalization (i.e. left-right trimming of insignificant whitespace, to match the syntactical rules of XML/XHTML NavDoc nav@epub:type=page-list).

The notion of "position" discussed here (i.e. fragmentation of publication resources in the readingOrder by arbitrary units of 1024 bytes) is different. Apples and pears.

llemeurfr commented 4 years ago

The notion of "position" discussed here (i.e. fragmentation of publication resources in the readingOrder by units of 1024 bytes) is different.

Which affordance should be associated with such data then, if any?

HadrienGardeur commented 4 years ago

Which affordance should be associated with such data then, if any?

A similar one where the field doesn't accept a string but simply an integer. You can also add +/- buttons or a SeekBar.