w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
304 stars 60 forks source link

Virtual Page numbers when not present from publisher #1542

Closed GeorgeKerscher closed 2 years ago

GeorgeKerscher commented 3 years ago

Distributors of EPUB and end users are requesting that a best practice be developed for inserting a "virtual Page Number." One distributor said they are considering to put virtual page numbers in for all EPUB titles ingested that do not already have the page numbers. Of course, where the EPUB is based on an existing physical book, the page numbers associated with the book must be used. However, the virtual page number can be very useful for:

Publishers may also consider providing this as a feature in their titles that do not have a print counterpart.

Specifically what has been discussed is to establish a standardized algorithm for inserting page break locaters in the text and in the NavDoc in the page list. The algorithm would look something like:

  1. Count a number of characters on an average page, perhaps 1,000.
  2. find a natural break point, e.g. the closes block element.
  3. Make sure that the previous element is not a heading, and if it is, place the marker before the heading. We don't want to end a page with a dangling heading.
  4. Increment the number and insert the proper markup, noting that it is a virtual page number, e.g. "virtual page 95".
  5. Insert a note in the AccessibilitySummary that virtual page numbers have been inserted.
MyDK commented 3 years ago

We - as a distributor of ebooks - will soon demand that all ebooks has a page number accociated in the metadata. If none is provided by the publisher, we will calculate a virtual page number:

1 A4 page = 1,800 characters - for eg. novels and text books / for childrens books we are not fully decided yet πŸ˜€

But if the community decides on a common page number calculation algrithm, whatever language, we are all ears.

baldwin47 commented 3 years ago

Missing page numbers are frequently cited by our academic customers as a barrier to adopting EPUB. For us the most common complaint is what George mentions around citations. This would help close that gap. We are also considering an in-house virtual page number algorithm, but would prefer to base this off of something from the specification.

TzviyaSiegman commented 3 years ago

Wiley creates virtual page numbers for e-only books. Customers (especially for textbooks) expect page numbers.

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-03-11

View the transcript ### 4. pages related issues _See github issue [#1503](https://github.com/w3c/epub-specs/issues/1503), [#1502](https://github.com/w3c/epub-specs/issues/1502), [#1501](https://github.com/w3c/epub-specs/issues/1501), [#1500](https://github.com/w3c/epub-specs/issues/1500), [#1542](https://github.com/w3c/epub-specs/issues/1542)._ **Matt Garrish:**: Updated EPUB accessibility for addressing a part of the following static pages issues. Some requirements can be addressed by EPUB Accessibility while others should go to best practices. > *Avneesh Singh:* [https://w3c.github.io/epub-specs/epub33/a11y/index.html#sec-page-nav](https://w3c.github.io/epub-specs/epub33/a11y/index.html#sec-page-nav) **Matt Garrish:** we had different questions about pagelists … one propose is to structure the requirements for pagelists … e.g. level A: some pages in the pagelist … level AA: all content pages linked in pagelist … level AAA: all pages (no exclusion, even blank pages) … I didn't move over with my requirements **Avneesh Singh:** I don't think these issues can be added in the guidelines, maybe the best place is best practices **Tzviya Siegman:** we had similar in PDFs with "this page intentionally left blank", where we left blank pages for starting the chapters on page right … I think that marking blank pages can be really confusing … maybe we can use ARIA **Avneesh Singh:** yes, but then someone will say that ARIA is not only for AT **Charles LaPierre:** I'm not sure if Read Aloud by the Reading system can get that information **Matt Garrish:** for sure we can investigate the use of ARIA label or something similar **Avneesh Singh:** I think this task force should focus on the requirements Matt put in the issues … we can go through the issue tracker … and discuss there **Matt Garrish:** we have a issue about the ordering of the pagelist … if I have the pages moved in the digital version, what happen to the pagelist? **Tzviya Siegman:** I have several examples of this **Avneesh Singh:** I think for AT users it would be useful to have an alert **Charles LaPierre:** maybe we can put in in the accessibility summary (metadata), but do we need to have some requirements for reading systems? **Gregorio Pellegrino:** maybe something like play order attribute in NCX tocs ---
iherman commented 3 years ago

The issue was discussed in a meeting on 2021-03-12

View the transcript ### 3. virtual page numbers _See github issue [#1542](https://github.com/w3c/epub-specs/issues/1542)._ **Dave Cramer:** this issue is about virtual page numbers. most reading systems make something up when they are absent. is lack of standardization a problem? location things in kindle are wack. **Gregorio Pellegrino:** investigate how users with print disabilities as well as RS developers use the page list. How can we improve the user experience? **Hadrien Gardeur:** page list is not very useful for RS developers. It's a string so it's problematic, so building UI off that is problematic. Best we can do is provide a list so you can jump to a page. Everyone invents their own system. Readium shows progress and "locations." … everyone has been doing their own thing for years so it will be difficult to try to move everyone to same approach. also shouldn't rely on authoring. **Tzviya Siegman:** what about pagebreak? publishers are using that and it goes hand in hand with pagelist **Hadrien Gardeur:** you have to render the whole content behind the scenes. anytime you have to examine the content is problematic to do live. there is a complexity cost if you need to calculate. **George Kerscher:** I did not envision this as something the RS had to do. I envisioned we would determine an algorithm and the pagelist in the nav doc would have the virtual page numbers. … also have heard that bibliographic references and citations are more dependabple with page numbers, would like to afford the same with virtual page numbers but have it baked into the content and not rely on algorithms. **Charles:** if we could provide publishers with a standalone tool to search for pagebreaks we could proved a pagelist for the nav doc. **Brady Duga:** not sure how we would get something like this adopted. RS implement how they like, might be hard to dissuade. Also there would be processing issues. **Avneesh Singh:** +1 to Brady. More suitable for best practices, not spec. Looking at spec made me think about non-western languages. … we should try to minimize special processing. Imagine if a browser could open our spec. less burden put on RS the better. > *Deborah Kaplan:* avneesh++ **Dave Cramer:** print publishers should number paragraphs instead of pages, lol. **Ken Jones:** problem is no markers in output from InDesign. **Dave Cramer:** generating a PDF from marked up content and trying to port pagebreaks back to markup file is tricky. **George Kerscher:** Word to EPUB tool does pretty well with page numbers. Want to incorporate data visualization for pagelist in the checker that flags inconsistencies or problems. Re: internationalization, not sure how that would work. Trying to bridge print to web publishing; this is one of those areas that help us bridge between analog and digital. **Ivan Herman:** do we need a resolution that says the page numbering doesn't need to be addressed in the spec, but rather somewhere else? Is this the direction we are headed? **Dave Cramer:** This is a multifaceted issue that affects authoring and reading systems and tools. Not quite comfortable with a resolution at this time.
iherman commented 3 years ago

The issue was discussed in a meeting on 2021-03-18

View the transcript ### 1. Virtual Page Numbers _See github issue [#1542](https://github.com/w3c/epub-specs/issues/1542)._ **Dave Cramer:** we discussed this issue on the last call without reaching a resolution … we recognize the utility of page numbers, esp. in educational environments … but different RS all do their own thing when faced with epub without pagelist … but that's not really something our WG can standardize … any further comments to bring us up to speed? **Wendy Reid:** we've had a lot of feedback from RS side, i.e. from Hadrien … but does the utility of standardizing outweigh the difficulty? **Rob Smith:** we have our own RS as part of our web platform, so we understand the difficulty … in our market, page nums and addressable locations are critical for academic citations … our buyers prefer PDF because of the predictable page numbers … if you're using epub without page numbers, the risk of plagiarism is higher because its harder to precisely locate citations > *Matthew Chan:* shiestyle addresses our Japanese members in Japanese **Shinya Takami (ι«˜θ¦‹ηœŸδΉŸ):** toshiakikoike from Voyager (a jp RS) says that it is difficult to implement these features … TOC serves a similar use case **Rob Smith:** I don't think a toc is suitable substitution … the toc is too sparse for a book of, say, 300 pages … the technical limitations are clear, but maybe we could have some sort of a non-normative recommendation? **Brady Duga:** we already have pagelists so the publishers can implement that … the question is what the RS should do in the absence of pagelist … to provide consistent virtual page numbers **Dave Cramer:** yes that's it, but what steps should we take to get there? **Wendy Reid:** i think one of the struggles is we technically have pagelist as an established method of doing this, but obviously its either not enough or there is something about it not meeting the need … maybe we could use something else as a kind of standardized, periodic locator? … this may fall under best practices, but it seems there is a clear desire for a standardized practice **Dan Lazin:** it seems that page numbers specifically are an artifact of printed books … in a lot of cases we want to preserve that correspondence to print … but there are other ways to locate inside a book … but also RSes have implemented their own solutions … so, as an alternative, we could use, for example, paragraph numbering (i.e. what legal texts do) … this wouldn't interfere with the existing page number system **Matt Garrish:** what George was looking at was an algorithm for inserting page numbers automatically … but maybe this is a task for a separate task force, with the goal of creating a note or something … rather than using our call time to do this **Dave Cramer:** it doesn't feel realistic to me to expect RS to change their existing way of doing things … e.g. ibooks will count the number of screens and repaginate for font size … and that goes against what a lot of other people want … it seems possible that we could come up with a tool to do a thing to epubs and produce a pagelist from that in the absence of a pre-prepared one … maybe a TF or the CG would like to take a shot at this … changing RS behaviour is kind of outside spec … but we could experiment with that tool as something to help out content creators **Wendy Reid:** i think a TF is a good idea … agreed that we probably won't change behaviour for major RSes … its an industry thing … trade publishers probably won't be as interested in this as educational side (VitalSource, Ebsco, etc.) … we could well get implementors of this from the right crowd of publishers **Dave Cramer:** does anyone want to try to put together a little script? … okay, we'll put together a task for to do these experiments
swickr commented 3 years ago

We've long discussed (citations needed; sorry, insufficient time) the need for finer-granularity references in addition to page numbers -- or virtual page numbers -- in EPUB as well as in other uses of CSS and HTML. But as we're seeing even with web browsers there's not yet an established UI metaphor for "show me the URI to this bit of what I'm seeing (or hearing) on the screen". Page numbers, for better or worse, don't require any extra effort on the part of the reader to find and cite.

I really want my web browser and my reading system to make it easy to copy and paste a fine-grained and resolvable identifier to a specific portion of a work.

mattgarrish commented 3 years ago

Google's link to text fragments are interesting, though, as they appear to operate on the same text locator basis as we've discussed in the past around web annotations. They add in some of the surrounding context to improve the reliability of the link.

It doesn't solve how to cite into a packaged epub, and isn't an easy mechanism for keeping pace in a print environment, but is perhaps some promising proof that text locators are a viable way of getting to a consistent destination (in text-based works, of course).

Having multiple ways to locate a position in a work isn't necessarily a bad thing, either, so it's not like we're in a zero-sum game anyway where we can only end up with authored page breaks or something else.