w3c / epubcheck

The conformance checker for EPUB publications
https://www.w3.org/publishing/epubcheck/
BSD 3-Clause "New" or "Revised" License
1.65k stars 402 forks source link

Sudden errors by different order of toc/spine by EPUBCheck v4.2.0-rc #1036

Closed shiestyle closed 5 years ago

shiestyle commented 5 years ago

We found EPUBCheck v4.2.0-rc will report errors when toc nav's order is different from spine order. Now we found that wrong order was prohibited already in EPUB 3.0 but EPUBCheck v4.2.0-rc may be the first version to be checked properly. (https://github.com/w3c/epubcheck/issues/888)

Unfortunately, we might already generate not a few EPUB files which have different toc order from spine because printed books have such toc order.

For example, our Manga content is structured like below.

[spine order]

[toc order]

Such different order of toc may be prepared because the value of chapter and column is different and such toc makes it easy for users to access to the prioritized chapters. I'm not sure but other publishers in Japan may also have Manga contents with such toc of different order from reading order.

Although we will modify our EPUB files when we find to use such wrong order toc, I'd like to discuss the way to reconsider of this specification for reproducibility of printed books or I'd like to propose to change behavior of EPUBCheck to WARNING, or other messages, in order to keep availability of existing EPUB files.

As already reported mainly from Japan, errors by EPUBCheck will give critical impact because it will prevent from delivering EPUB files to eBook stores in Japan.

I appreciate if you have any interest this topic.

Doktorchen commented 5 years ago

Looking into my books, this rule can cause trouble for multilingual books (sections in different languages in each document chapter, but for each language a sublist in the toc) or for non-linear books (a document can be the next document for multiple documents in the toc).

The way out might be not to put everything in this specific nav element with ops:type 'toc', if it does not fit into the order. The other option would be to accept, that digital books can be quite different from printed books, they do not always have exactly one spine or exactly one ordered navigation concept, this is one of the big advantage of digital books compared to printed books. However, EPUB seems not to reflect such advantages of digital books very much, it seems to prefer to reduce digital books to a simplified or second choice type of books (most internet bookstores seem to have the same strategy to force author to publish only minimalistic, simplified EPUBs compared to printed editions), not obvious, why.

Often, to provide such advantages to the audience, authors have to work around these restrictions. Maybe, this is the option for those types of mangas/comics as well? Another option could be to extend EPUB in version 4 for such digital books beyond the possibilities of printed books?

rdeltour commented 5 years ago

Thank you for the feedback @ShinyaTakami. I think this needs to be escalated to the CG, @dauwhe @RachelComerford @mattgarrish.

mattgarrish commented 5 years ago

It's a useful indicator of possible authoring error - that you shifted your content around but forgot to update your toc to match it - but that doesn't warrant a must/error.

There's also a case that it's confusing to the reader if we loosen the restriction, especially for persons with cognitive disabilities, but that's something better addressed by the accessibility specification/techniques.

I wonder, though, if it was an attempt to make the toc more compatible with translation to an ncx. Again, not sure that warrants a must, but would better explain the rationale for having the requirement.

nekennedy commented 5 years ago

It also flags out-of-order page-lists, which I'm seeing a lot of and think is a good idea.

Are there any use cases for out-of-order page-lists?

Doktorchen commented 5 years ago

Use cases are in general books, that do not really have (only) one reading order. Such books have another structure than the simple spine model. This is more or less the model of an ordered list (XHTML:ol), an encyclopedia for example typically is more the model of no list at all, respectively in a printed book an unordered list (XHTML:ul).

In non-linear books the audience may have at the end of each document/chapter the choice between alternatives, how to continue - parts of the book may have some order, but at some points there is an unordered list of alternatives to chose. They have an additional list (toc) for current alternatives at the end of each chapter - why not to combine all this into the main toc to provide an alternative access to the content? Some chapters may appear multiple times in such a combined list.

Books with more than one language may have such a selection as well. For comics or other books with the major content as graphics (SVG inside XHTML) may have the texts for the graphics in the same document, but alternatively in different languages. Such a book may fit into the model of the spine, but in the toc one may want to provide one ordered list for each language, each list point related to a fragment into a document, that fits to the language.

Without graphics: If one provides the text in two languages, the audience can have different reasons or motivations. A part may want to read only the version in the first language, another in the second language, a third part may want to compare for educational purposes. Therefore it is useful to provide three (or four) sublists in the toc, one for each fraction. With the CSS pseudo-class :target one can even adjust the styling of the document, depending on fragment target listed in the sublist.

If one puts everything strictly in one order, the result can be to reduplicate a lot of content. For some books (especially the non-linear ones), one will reduce the toc simply to frontmatter, backmatter and the beginning of the text and maybe an additional document with an extended toc for details and everything (without EPUB restrictions as for the ops:toc), that does not fit into an ordered list at all. This might be the general workaround or solution, to avoid the error message for books with more complexity and options for the audience to select between alternatives (for accessibility or by concept of the book).

If EPUB wants to cover those use-cases, the spine could contain different structure, not just an ordered list, an unordered list as well and the choice between alternatives, the same applies for a toc.

rdeltour commented 5 years ago

The EPUB 3 CG decided to keep this an ERROR during the April 11, 2019 con call.

I'll consequently close this issue, but feel free to keep on discussing it here. If you disagree with the CG decision, please open an issue on the CG repository.

shiestyle commented 5 years ago

Let me reopen this issue because the impact will be critical in Japanese eBook market more than we expected.

dauwhe commented 5 years ago

Note this is not a change made in epub 3.2. The spec is clear in epub 3.0.1:

The order of li elements contained within the toc nav element must match the order of the targeted elements within each targeted EPUB Content Document, and must also follow the order of Content Documents in the Rendition's spine.

shiestyle commented 5 years ago

We understand the current spec in EPUB 3.x and EPUBCheck 4.2.x will follow that but in Japan regulation-violated EPUB files were already generated when converting paper books to eBooks.

We will collect such examples existing in Japanese market by several publishers.

rdeltour commented 5 years ago

@ShinyaTakami I hear you concerns. However, as noted above, reporting this issue as a mere WARNING or even disabling it would be inconsistent with the spec. I'm not saying it's totally impossible, but the EPUBCheck team cannot make this decision, this needs to be approved by the community (EPUB CG, possibly requiring backing from the Publishing BG). The last time we discussed this (April 11, 2019 call), the CG decided to keep it an ERROR.

mattgarrish commented 5 years ago

I still agree with @ShinyaTakami that given evidence Japanese publishers have taken different approaches, it's too late to put this cat back in the bag. It's been almost ten years of not enforcing the rule, so there should be a compelling reason to change course now.

It feels arbitrary that we're going to let HTML in iframes fly under the radar, for example, but be draconian about this.

dauwhe commented 5 years ago

I know it sounds crazy, but what about having a path in epubcheck saying that if lang=jp then we skip this check?

clapierre commented 5 years ago

either that or have a command line option not to include that check.

laudrain commented 5 years ago

@ShinyaTakami will existing and already distributed EPUB ever go through EPUBCheck 4.2 ?

But as EPUBCheck 4.2 will be used on new titles coming in distribution, production spec may be revised now to cope EPUB3 spec.

Then do we have an issue?

mattgarrish commented 5 years ago

The nav document requirements were a translation of the NCX rules into HTML, but this particular rule has no corollary in EPUB 3 that I know of since the navigation document isn't defined for playback -- media overlays took on that functionality.

Before breathing life into this unused requirement, we should have a solid case for why it is even needed. It seems like a more useful check for Ace to enforce.

laudrain commented 5 years ago

@mattgarrish do you mean it is only an accessibility requirement ?

mattgarrish commented 5 years ago

do you mean it is only an accessibility requirement ?

As far as I can tell, ordering doesn't serve any vital function within the specification itself, unlike the other requirements for being able to process and extract the links to display. Like I wrote above, it looks like a bit of an over-zealous translation of NCX rules, where playback depended on the order of elements.

Accessibility is one important reason for having ordering, regardless of the technical arcana of how the rule came about, but we don't enforce those kinds of best practices through normative statements in the core specs. I think this could be better checked by Ace for those who are concerned about ordering.

rdeltour commented 5 years ago

I don't think this is heavily impacting accessibility. It is not totally unrelated to WCAG 2.4.3 (focus order) or 3.2.3 (consistent navigation), but neither of those can be explicitly interpreted as requiring the ToC to be consistent with the document order.

But anyways, we're facing the old question of whether EPUBCheck should strictly follow the spec, or be lenient about widespread invalid content. As @mattgarrish said, we did loosen the checks for iframes and can very well do the same for this requirement too, if the community decides it's the right approach.

We should however be wary about asking EPUBCheck to willfully ignore the spec; if the logic is pushed to the extreme it would mean that we can never fix old wrongs or implement previously-unchecked spec rules. Ideally, these issues should be really fixed at the spec level. It's too bad we missed the opportunity to prune this requirement from EPUB 3.2 😞, if it is really unused and not necessary!

mattgarrish commented 5 years ago

I don't think this is heavily impacting accessibility.

I believe it helps with cognitively being able to follow a document, part of the multiple ways success criterion. If you look at the technique for including a table of contents, one of the procedures for determining compliance is:

Check that the values and order of the entries in the table of contents correspond to the names and order of the sections of the document.

But to your point...

if the logic is pushed to the extreme it would mean that we can never fix old wrongs or implement previously-unchecked spec rules

Yes, this drives me crazy when we deviate!!

But in this case the rule itself is flawed and hasn't been enforced. In the face of evidence that people weren't aware of it and are constructing their content in different ways, I just don't see the need to rush and implement an error. Leave the status quo alone and let's take this up the next time.

Doktorchen commented 5 years ago

As long as this is mentioned in the specification, it is ok to be checked. For authors of new books it is no problem to work around it, if required or useful for non-linear content with no preferred order.

However, still surprising, that this requirement applies to non-linear content at all, if mentioned within the navigation. This means effectively, that authors have to provide an additional navigation for relevant non-linear content, they want to mention in a navigation. I already started to update first books with this problem with additional navigation files beyond a basic navigation with this specific rule.

Accessibility concerns for linear content are understandable, however for non-linear content authors should know better than scripts, which arrangement provides a good access, respectively whether it might be useful to provide additional alternative arrangements for people with different approaches to understand a text or different capabilities.

shiestyle commented 5 years ago

Thanks for your active discussions and sorry for my late reply because of business trip to overseas last week.

Now let me summarize this issue from my point.

[What problem in Japan?]

In Japan, EPUBCheck is used both when generating EPUB files by publishers and when accepting EPUB files by eBook stores (including Apple). So it's not convenient for us if existing EPUB files will be marked as ERROR when we have to re-deliver EPUB files to eBook stores because of changed phone # in colophon, for example, or when we have to deliver all of our EPUB files to new eBook stores.

So I have to say that it's not a problem for new EPUB files as an answer to @laudrain 's question.

Most of EPUB files in Japan are generated manually and we have to modify manually. Thus it's not easy to investigate how many EPUB files with TOC regulation-violated exist already and that's the reason why Japanese publishers have strong concerns to EPUBCheck's error.

KADOKAWA decided to modify their incorrect EPUB files when found but I heard some major publishers in Japan may not accept the new behavior of EPUBCheck 4.2. And they will request not to use EPUBCheck 4.2 or later to eBook stores.

That's the reason why I ask to re-open this issue.

[Specification in EPUB 3.x]

We are so sorry but for long time we were not aware of that 'different order of toc/spine' is not allowed in EPUB 3.0.

As already some people commented, I also think the current regulation can be changed and should be changed for expansion of EPUB format's ability because paper books can provide any kind of TOC but EPUB cannot due to this regulation.

EPUB 3.2 is already finalized and we Japanese publishers hope to discuss this issue as a topic for EPUB 3.2.1 or later.

[Behavior in EPUBCheck 4.2.x]

I understand the policy of following EPUB specification by EPUBCheck 4.2.x and EPUBCheck's behavior is correct because 'different order of toc/spine' is not allowed by even latest EPUB 3.2.

I also know, as an engineer, it's not appropriate approach to add irregular behavior or to add local behavior.

However, is it possible option that EPUBCheck will take features of future version of EPUB in advance if changing specification about TOC in the future will be agreed in PBG and EPUB3-CG?

[My suggestion]

I think the story below may be one of the reasonable solutions for this issue.

  1. Japanese publishers will propose to change the TOC spec in future version of EPUB 3.
  2. W3C PBG and EPUB3-CG will discuss this issue and agree with change in EPUB 3.2.1.
  3. EPUBCheck 4.2.2 will integrate the change about TOC spec in advance.
laudrain commented 5 years ago

@ShinyaTakami thanks for your detailed answer.

Your proposed plan has to be considered by EPUB3-CG and PBG.

Meanwhile, I'm wondering if reporting this issue as a mere WARNING would help in Japan. @rdeltour mentionned this as a possibility in his comment above.

shiestyle commented 5 years ago

Yes, outputting as WARNING by EPUBCheck will be another solution for Japanese market.

rdeltour commented 5 years ago

Yes, outputting as WARNING by EPUBCheck will be another solution for Japanese market.

If this is indeed reasonable for the Japanese market, I personally believe this could be the best solution:

Let's propose this to the CG and PBG.

shiestyle commented 5 years ago

I could gather samples and usecases of TOC regulation-violated EPUBs in Japan.

We found many samples in magazines or specialized books.

[Low prioritised Contents like Columns]

Many cases were reported for treating columns (small piece of content or additional content in the book).

Such low prioritised contents are sometimes located in the last of NavDoc because it's easy to access to the main chapters by skipping.

A similar case was found in the traveling guide book that small maps in the page were treated as the same as columns.

On the other hand, we found the case of bonus content which is placed next of TOC in NavDoc (actual content is located in the last) in order to appeal such special content is bundled.

[Another index in Cooking Book]

We found the case in cooking books that the first TOC sorted by types of genres like meat, fish, soup etc. was placed by the order of chapters and the second TOC sorted by types of materials like tomato, egg, milk etc. was also placed additionally for better experience of searching recipes.

This multiple index feature should be discussed in the future to enhance capability of eBooks, I think.

[Magazine of Mangas]

Magazines for Manga contents are popular in Japan and TOC page tends to be located in the last chapter, after Manga chapters, and we sometimes move it to the top level area when generating eBooks in order to make it easy to find TOC.

[Magazine of Novels]

See attached TOC image (modified for content protection). toc_of_magazine_sample

In this case, chapters are grouped by types and content of from page 022 is located in the columns section.

This is a kind of sample of paper book and we have to generate out-of-order NavDoc in EPUB for reproduction of the paper book.

dauwhe commented 5 years ago

I've opened in issue in the EPUB repo to discuss the spec question: https://github.com/w3c/publ-epub-revision/issues/1283

dauwhe commented 5 years ago

Cross-posting from EPUB 3 CG repo... there are two separate issues here, one on nav order vs document order within a single content document, and one on nav order vs spine order. Would both of these need to be changed to WARNING to meet the needs of Japanese EPUBs? Or only the latter?

shiestyle commented 5 years ago

Will EPUBCheck evaluate the former one?

If yes, we should change both to WARNING.

rdeltour commented 5 years ago

Will EPUBCheck evaluate the former one?

EPUBCheck currently checks both using the same logic. It's not impossible to change that, but that's how it's currently implemented.

laudrain commented 5 years ago

@all following this thread and confirmation from @ShinyaTakami, I urge the EPUB3 CG to validate an immediate EPUBCheck change to WARNING instead of ERROR for those checks.

As @mattgarrish said

I still agree with @ShinyaTakami that given evidence Japanese publishers have taken different approaches, it's too late to put this cat back in the bag. It's been almost ten years of not enforcing the rule, so there should be a compelling reason to change course now.

Then, in the issue 1283 Dave started in EPUB3 CG, discuss spec evolution taking all aspects in account, particularly accessibility issues.

rdeltour commented 5 years ago

Resolution: the EPUB CG decided to make this check (NAV_011) a WARNING (on the 2019-07-11 call, minutes to come).

dauwhe commented 5 years ago

Minutes: https://www.w3.org/2019/07/11-epub3cg-minutes.html