w3c / publishingcg

Repository of the Publishing Community Group
https://www.w3.org/community/publishingcg/
Other
18 stars 8 forks source link

Virtual Pages Once and for All #73

Open GeorgeKerscher opened 5 months ago

GeorgeKerscher commented 5 months ago

Description

The problem of titles that do not have the print page number inserted into EPUB has been a problem for a long, long time. We suggest that the Publishing Community Group address this problem.

There are library systems that are willing to ingest EPUB and insert page numbers into the text, but they insist that there be an accepted mechanism for doing this. The Reading Systems should also agree to go to a virtual page number when none exist. In this way,We can resolve the many issues that lack of print page numbers present. This is essential for citations and collaboration.

This issue exists in the Publishing CG issue tracker

Digital Only Page Breaks#19

rickj commented 5 months ago

How we apply this:

  1. For EPUB titles, on ingestion...
  2. If the title is missing page break elements...
  3. If the content character count is greater than 8500 characters (aggregating up the length of the most granular text nodes in a parsed XML tree from the XHTML file... so markup elements are not counted)...
  4. Insert a page break element at the top of each content document, and every 5500 characters of content (after the DOM element where it crosses that much content)
wareid commented 5 months ago

Draft of the Virtual Locators work: https://docs.google.com/document/d/11GypOjE9xOTaINATl5bxVIA3Mc9jzNBGCr6GT_KNaQ4/edit?pli=1

P5music commented 5 months ago

@wareid Please, no.

ePub has not pages because it is reflowable. If editors put pagination information in an ePub book, as it is possible, they will refer to the printed book or they will decide a sort of pagination for their ePub book. Sometimes the pagination does rely on a particular device, with different versions if necessary (they can do it because the ePub is not an official product but it is generated for the particular reader model by back-end systems).

But the technical issues that have been pointed out several times in discussions on this repository (and even in the "Virtual Locators" document itself) clearly state what ePub producer simply have decided to assume, that is, that they better ignore further ePub proposals. They avoid bothering too much with the ePub format at all. It is just HTML code, easy to render with Web-Kit open-source library. All the rest is over-engineering, from their point of view. When someone is obliged to cite pages, they will cite real pages of a real printed or pdf edition. They are still the standard nowadays.

About annotations, it would be nice to have a real research among users, I think we would discover that their real needs are far from what we devise here, being mainly satisfied by the big tech companies, that simply perform A/B testing and check when income figure go up. Users nowadays use "second-brain" note-taking apps, that are in advance of any ePub standard. Regards

dazrand commented 4 months ago

@P5music

The addition of virtual locators as page breaks is effectively an invisible adjustment, only used when existing print page numbers are not added and when pseudo pages breaks have not be created for the reflowable format. There is a distinct need for page numbers in academic and research publications, especially when making citations. Educators and institutions will continue to prefer, whether it is right or wrong, pages for citations in publications, this is just filling that need when the publishing chain has failed to do so.

P5music commented 4 months ago

@dazrand Hello I do not think that page numbers are not included in ePub publications always because the publishing chain fails to do so. The publishers just exploit the ePub features, the main feature being reflowability so, if they fully embrace it there is no need for including paginating information. Page information could be misleading from a certain point of view. Those are publishing choices. If other editions exist with pages and the publishers do want to include the page information to have a sort of compatibility among editions (for citations, notes or whatever), then the ePub feature of pagination will be used. But it is not mandatory. When the publisher simply fail to include that information and ignore the cross-edition compatibility thing, I think that introducing an arbitrary (although sensibly devised) algorithm to calculate it would be wrong, because then it would be incoherent with the other editions, and with possible new versions of the ePub itself where the official page breaks and numbers are included to fill the gap. So the only solution would be inserting page breaks mimicking the other editions of the book, maybe the pdf version, provided that the content be identical. Of course it is not feasible by third parties other than the publisher. You are right, publishers should not fail including the page information. But I do not think they are unaware of this issue. Maybe it is risky for them to put pagination into the ePub, for whatever reason, maybe just for the opposite reason of what mentioned above, that is, they do not want the ePub version to have any kind of officiality. Regards