Reading system handling of non-Arabic page numbers in page list

mattgarrish commented 3 years ago

As originally asked in #1471:

are there any rules about how RSs should deal with non-Arabic page numbers?

mattgarrish commented 3 years ago

Removing the accessibility label from this issue as we don't define reading system requirements in the accessibility spec.

Can you think of any specific guidance you'd want to give here @gregoriopellegrino ?

We could note to be aware that page numbering across publications and even within a publication may not follow a consistent scheme, but I'm not sure what else we could tell reading system developers to do about that.

jenstroeger commented 3 years ago

@mattgarrish and @gregoriopellegrino, I’m still a little unclear on this: if my publication uses roman numerals in its frontmatter and Arabic numbers in main & backmatter, what’s the recommended way to manage these different page numbers in an EPUB?

mattgarrish commented 3 years ago

what’s the recommended way to manage these different page numbers in an EPUB

There isn't anything to do differently from an authoring perspective. This is, I believe, only about whether we should provide guidance to reading systems that they may not encounter a single numbering scheme.

The NCX's pageTarget element was much richer in terms of being able to solve these problems (value separate from label and play order for an alternate sequencing), but I don't know that we could replicate much of that in HTML.

gregoriopellegrino commented 3 years ago

I don't think we can use specific attributes for HTML elements for this purpose (unfortunately).

I would suggest adding a note saying that publications can sometimes have different numbering within the same publication, e.g. Roman numerals, Arabic numerals, letters of the Latin alphabet (upper or lower case). So do not rely on automatic string ordering systems, because they may lead to unexpected results.

I have no experience in the case of other types of numbering, e.g. for publications in Japanese, Chinese, etc.

jenstroeger commented 3 years ago

I have no experience in the case of other types of numbering, e.g. for publications in Japanese, Chinese, etc.

In some Middle-eastern print books I’ve seen and been told of page numbers with Eastern Arabic numerals or Abjad numerals, and then there are Hebrew numerals as well.

I suspect that we’d want to use those for publications in those languages… 🤔

xfq commented 3 years ago

FYI - related text in jlreq: https://w3c.github.io/jlreq/#h-note-37

In addition, I just looked at the page number formats in Word (in Chinese):

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-04-15

no resolutions were taken

View the transcript

### 7. Handling non-Arabic page numbers _See github issue [#1505](https://github.com/w3c/epub-specs/issues/1505)._ **Dave Cramer:** what are the rules for RS when faced with non-Arabic page numbers? **Brady Duga:** are there any rules for handling Arabic page numbers? **Dave Cramer:** right, we haven't really said anything about this **Brady Duga:** Arabic or otherwise **Dave Cramer:** we've left a lot of these UI decisions up to RS … some RS will use the content of the pagelist if it exists instead of internal page numbering … assume these would just use whatever string is in there **Dan Lazin:** that's true for Apple **Brady Duga:** true in Play Books **Dave Cramer:** yep, backmatter in educational books often use letters **Brady Duga:** we often see a mix of Arabic numerals and other stuff **Dave Cramer:** not sure I see the problem here … is there any guidance in the pagelist section on this? **Wendy Reid:** don't think so … the context of this is really how RS communicate non-Arabic page numbers, but if RSes already have a solution... **Dan Lazin:** there is an issue with the string-based solution with TTS and AT … cannot pronounce arbitrary string … maybe possible to solve it with ARIA label? **Dave Cramer:** maybe have informative statement that if there is a pagelist, then RS should use strings embedded in pagelist? … if they want to present that to user? **Wendy Reid:** maybe refer this back to mgarrish? The a11y spec does not tell RS what to do, which is how it ended up here … maybe we need him to explain why this came up in a11y in the first place **Dave Cramer:** okay, so no resolution for now … further discussion to come … okay, thanks everyone! > *Dave Cramer:* rrsagent: bye ---

r12a commented 3 years ago

Unfortunately, i don't have a lot of information to hand about page numbering practices, but i assume that people will sometimes want to use non-ASCII digits.

Is there a role for CSS counter-styles here? (https://www.w3.org/TR/predefined-counter-styles/)

Gecko supports these author-defined styles, and it has just been implemented (though pending release) in Blink.

Is it relevant to discuss how to manage publications that have page numbers that run from both ends ? eg. i have a dual-language Japanese in-flight magazine here that starts with page 1 at each of the covers and increases the numbers towards the point in the middle of the document where the two translations meet.

(Btw, the term 'arabic numbers' is a little confusing. The Unicode Consortium recommends talking about 'European digits' for 1234... and 'arabic-indic digits' for ٠ ١ ٢ ٣ ... which is slightly better. Personally, i tend to talk about ASCII digits when i want to refer to 1234...)

jenstroeger commented 3 years ago

I think there are two sides to the problem:

Which numbers a RS should use to display page numbers; and
How to express the use of more than one kind of numbering within the same publication, e.g. in traditional Western publications it’s common to use roman numerals for the frontmatter of the book, and Arabic/European numbers for the remainder. This goes back to @mattgarrish’s comment.

One could imagine using e.g. ASCII numbers throughout the book’s markup (just like English is used for XML tag names or CSS keywords) and inform the RS to map the numbers to the number system identified by the book’s primary language.

@r12a —

eg. i have a dual-language Japanese in-flight magazine here that starts with page 1 at each of the covers and increases the numbers towards the point in the middle of the document where the two translations meet.

Nice 🤓 So there’s not even a primary language for the publication?

r12a commented 3 years ago

So there’s not even a primary language for the publication?

Correct. If you open the magazine in the middle and lay it on a table, the Japanese articles are on the right and the English articles on the left. The Japanese content page numbering runs from 1 to 135, starting at the right-hand cover and progressing inwards. The English content numbering starts with the left-hand cover and runs to 69. There are 10 pages in the middle without page numbers that contain maps and arrival/transit information written in both languages. The right-hand cover is in English, and the left-hand cover is in Japanese, and they are not recognisably for the same publication (different text, different images, different background colours).

mattgarrish commented 3 years ago

Is there a role for CSS counter-styles here?

This issue is only about the page list, not styling of page numbers in the content, so I'm not sure that counters would be helpful. The page list is like the table of contents (i.e., a big list of links).

Layout and styling of page numbers in the content, which is probably only realistic in fixed layouts, is entirely in the author's domain to do.

These questions arise in the context of creating an interface for jumping to the pages. What we need to know is whether the text value of the links in the page list matters to reading systems, and if so, why.

If a reading system, at best, is only going to text match what a user inputs against the values, then there's arguably nothing to be done here. If reading systems re-sequence the page list for users (e.g., offering descending or ascending order), however, then having schemes that mix text and numerals is problematic. But without a separate sequencing attribute, what can we even do about this?

Is it relevant to discuss how to manage publications that have page numbers that run from both ends ?

A good case for why following the order of page breaks in the text doesn't always make sense. It's probably a case for allowing multiple page lists, too, although how a reading system would identify or use them would be a challenge. Otherwise, as I understand it, you'd have two page 1s, 2s, etc. Even manual navigation of these would be problematic. It would probably require separate publications in English and Japanese.

iherman commented 3 years ago

On the dual language magazine: the representation of that publication in EPUB would be to use two content documents in the spine: one for the Japanese and one for the English one, both properly annotated as Japanese with (I presume) vertical and right to left, and English with the usual setting, respectively.

r12a commented 3 years ago

Thanks for the explanation @mattgarrish .

Then, if i understand correctly, it seems to me that the title of this issue isn't really "handling of non-Arabic page numbers", but instead how to handle separate runs of numbering (regardless of what the presented form of the page numbers is). It seems to me that relying on the visual appearance of the page number at all would be problematic in several ways. I guess that a key question is how to know where a particular range of numbers starts and ends.

jenstroeger commented 3 years ago

@iherman —

the representation of that publication in EPUB would be to use two content documents in the spine

Are you referring to the spine element in the OPF? How would you go about representing two documents there?

@r12a, that’s what I tried to say in my comment above, too 🤓

mattgarrish commented 3 years ago

how to handle separate runs of numbering

Right, we traditionally have front, body and backmatter runs which may or may not use different schemes so the sequence may restart for each.

You often have roman numerals for front, switch to Arabic numerals for body and then appendixes in the back might have pages numbered using the appendix letter joined to an Arabic numeral.

But I'm not sure there's a problem we need to solve here, as we don't define resequencing of the page list. I think simple string matching of user input is the most we'd expect.

The Unicode Consortium recommends talking about 'European digits' for 1234... and 'arabic-indic digits' for ٠ ١ ٢ ٣ ... which is slightly better.

I don't doubt you at all, but "roman" and "arabic" are so ingrained in publishing page numbering speak that it's hard to break from!

The specification doesn't currently say anything about the numbering, but we'll certainly have to keep this in mind if we do get into more detail. I've also seen "Western Arabic numerals" in use while edifying myself on this, which is close to the publishing term, but is that any better in your opinion?

aphillips commented 3 years ago

I have a little experience in this space but am playing catch-up on this thread. In addition to some of the numbering schemes mentioned here, some I don't see mentioned are section-based numbering (1-1, 1-2, 1-3..., 2-1, 2-2, etc.), plus all manner of lettered schemes. I've seen examples of local digit systems being used for page numbering in print but can't speak to the potential richness of alternative publishing traditions beyond what others have mentioned.

I think the key here is to think of sequentially numbered portions of a book and then having a style assigned to that "section". (Even pages without numbers printed on them generally have a number assigned.) Generally the assignment needs to be sticky (not subject to the user changing it) because of cross-referencing (think TOC, index, or generated internal referencing) as noted by @jenstroeger.

It might be useful to think to some degree in terms of counter-style type behavior, since, like CSS, you might not want to maintain all of the potential schemes over time nor will you want to exclude publishing traditions that turn up later.

mattgarrish commented 3 years ago

Maybe I should give a more detailed explanation of this issue (I'm guilty of only linking to #1471 when I split this out).

The page list is a nav element in the navigation document that is included by the author to allow reading systems to provide users access to specific static page locations in the text.

Each list item in the page list identifies the page number, so there will be transitions like this:

<nav epub:type="page-list" role="doc-pagelist">
   <ol>
      ...
      <li><a href="intro.html#pgxxiv">xxiv</a></li>
      <li><a href="c01.html#pg001">1</a></li>
      ...
      <li><a href="c45.html#pg387">387</a></li>
      <li><a href="appa.html#pga-1">A-1</a></li>
      ...
</nav>

These links typically go to static page break locations in the text like this:

   <span epub:type="pagebreak" role="doc-pagebreak" aria-label="31"/>

A reading system may present the page list to the user as the simple ordered list it is so that the user can select the page they want to go to, but it's also possible to use the list as input to create a "go to" page function, for example, where the user types in the number they want to jump to.

The question is what issues do these transitions in numbering present for reading systems, if any, in providing these interfaces?

Reading systems don't use these page numbers or the page list to present the numbering of pages in the viewport. The pagination presented in reading systems is solely the domain of the developers of those reading systems at this time as it depends on the virtual pagination of the content.

If we were to try and define a way for authors to influence this numbering, then we certainly get into issues of counters and their formatting, but right now that's a separate problem we'd have to develop from the ground up.

We also can't rely on reading systems cascading styles onto the nav element and then using the computed values. The navigation document has to be looked at as raw input to the reading system where only the html will get processed.

mattgarrish commented 3 years ago

One thing we have discussed in the past, though, is using epub:type in the navigation document as hints to the content. So, in this case, it would be possible to do:

<nav epub:type="page-list" role="doc-pagelist">
   <ol>
      ...
      <li><a epub:type="frontmatter" href="intro.html#pgxxiv">xxiv</a></li>
      <li><a epub:type="bodymatter" href="c01.html#pg001">1</a></li>
      ...
      <li><a epub:type="bodymatter" href="c45.html#pg387">387</a></li>
      <li><a epub:type="backmatter" href="appa.html#pga-1">A-1</a></li>
      ...
</nav>

But, this perpetuates epub:type as an all things for everything solution, even though it makes no real sense on link elements.

Adding semantics only solves part of the problem, too, as the issue of ordering within each group would remain. We'd still need a sequencing attribute.

And that's assuming there is a problem that needs fixing, which we'd need to hear is true from reading system developers.

dauwhe commented 3 years ago

<nav epub:type="page-list" role="doc-pagelist">
  <ol>
    ...
      <li><a href="intro.html#pgxxiv">xxiv</a></li>
      <li><a href="c01.html#pg001">1</a></li>
      ...
      <li><a href="c45.html#pg387">387</a></li>
      <li><a href="appa.html#pga-1">A-1</a></li>
    ...
</nav>

Given the information is already in the pagelist, I don't see the need for worrying about counters. Should we just provide basic guidance to the reading system that if it uses the pagelist, it should use the text value, regardless of numbering system?

If a Reading System uses information from the pagelist to provide page numbers, it MUST use the text value of the a element.

mattgarrish commented 3 years ago

If a Reading System uses information from the pagelist to provide page numbers, it MUST use the text value of the a element.

Ya, I don't know what to say. I haven't seen evidence that it's used to provide page numbers, so this would probably turn out to not be testable in any reading systems.

Absent a clear problem in current reading systems that needs solving, my inclination is to leave things as they are in the reading system specification. We may want to make a statement about authoring instead -- for example, that the value of each link must represent the page number regardless of the numbering scheme, so also clarify no "page " prefixing or other stuff like that. That should make clear to any devs looking to use the page list what they will encounter.

An actual example of a page list in the authoring section would probably also help.

iherman commented 3 years ago

The issue was discussed in a meeting on 2021-04-29

List of resolutions:

Resolution No. 2: Close issue 1505

View the transcript

### 4. RS handling of non-Arabic page numbers in pagelist _See github issue [#1505](https://github.com/w3c/epub-specs/issues/1505)._ **Dave Cramer:** this came up because gregorio asking about rules for non-Arabic page numbers … answer seemed to be that there are no rules … pagelist is just list of links, authors choose how to label those links … issue discussion started to revolve around counter systems used around the world … but this seems like an aside **Brady Duga:** also, we have no rules for Arabic page numbers either … we have no rules for page numbers, period **Dave Cramer:** maybe we could recommend that if RS choose to display numbers from pagelist, that RS should display the values chosen by author … e.g. where author has carefully chosen the type and sequence of page numbers **Brady Duga:** agree (also, it seems obvious that RS shouldn't do stuff like that) **Dave Cramer:** yes, and also, the original concern seems somewhat theoretical > **Proposed resolution: Close issue 1505** *(Wendy Reid)* > *Dave Cramer:* +1 > *Wendy Reid:* +1 > *Ben Schroeter:* +1 > *Toshiaki Koike:* +1 > *Matthew Chan:* +1 > *Brady Duga:* +1 > *Shinya Takami (高見真也):* +1 > *Dan Lazin:* +1 > ***Resolution #2: Close issue 1505***

dauwhe commented 3 years ago

I'm closing this issue based on the working group resolution.

w3c / epub-specs

Reading system handling of non-Arabic page numbers in page list #1505