w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
79 stars 19 forks source link

Dual-approach for the TOC #350

Closed HadrienGardeur closed 6 years ago

HadrienGardeur commented 6 years ago

As a follow-up to our discussions at TPAC, I think it's worth re-opening the case of the dual-approach for the TOC.

While this idea was proposed before (https://github.com/w3c/wpub/issues/291#issuecomment-416344052, https://github.com/w3c/wpub/issues/291#issuecomment-416352888, https://github.com/w3c/wpub/issues/291#issuecomment-416356123), it still seems very relevant following comments from Penguin Random House and Hachette Livre (respectively the largest trade publishers in the US and in France) that this is how they're producing EPUB files today.

If my understanding is correct (cc @laudrain and @liisamk) this is how they handle their EPUB production today:

Our current attempt in EPUB and in this group to use the same document for rendering and machine readable info still feels like a dead end IMO. No matter how we define an algorithm for extracting the TOC, we always reach a limit where things eventually fail. Publishers might also decide that they want a radically different navigation (for example a map as an SVG) or add a lot of additional information that should not be processed (@liisamk mentioned such examples at TPAC).

This makes me wonder if we're not trying to use the term TOC for two separate concepts:

While some publications may be able to use the same document for both purposes, I don't think that's true for all publications, far from it (as we can see with content produced by some of the largest trade publishers).

I've updated an earlier example to further illustrate how this could work:

The idea is fairly basic:

The UAs could potentially do the following:

iherman commented 6 years ago

Is there (should there be?) an equivalent to the doc-toc ARIA attribute that we would use for the visually rich navigation document?

HadrienGardeur commented 6 years ago

@iherman I haven't looked into it yet but it's not clear that something else than doc-toc would be needed.

laudrain commented 6 years ago

The answer is no, neither in Digital Publishing roles nor in ARIA roles. But that is normal as role is for accessibility not for semantics.

liisamk commented 6 years ago

Hadrien-

You are correct that Hachette and PRH are both creating two docs with somewhat different purposes. This summary explains the goals/issues we have at PRH and I suspect is similar to what the experience is for Hachette or other Trade and Educational publishers.

I think it is rare that you would want to use the same doc for both of these purposes.

Let me know if it would help to have examples to see how this works.

Thanks!

Liisa

From: Hadrien Gardeur notifications@github.com Reply-To: w3c/wpub reply@reply.github.com Date: Tuesday, October 23, 2018 at 3:42 AM To: w3c/wpub wpub@noreply.github.com Cc: "McCloy-Kelley, Liisa" lmccloy-kelley@penguinrandomhouse.com, Mention mention@noreply.github.com Subject: [w3c/wpub] Dual-approach for the TOC (#350)

As a follow-up to our discussions at TPAC, I think it's worth re-opening the case of the dual-approach for the TOC.

While this idea was proposed before (#291 (comment)https://github.com/w3c/wpub/issues/291#issuecomment-416344052, #291 (comment)https://github.com/w3c/wpub/issues/291#issuecomment-416352888, #291 (comment)https://github.com/w3c/wpub/issues/291#issuecomment-416356123), it still seems very relevant following comments from Penguin Random House and Hachette Livre (respectively the largest trade publishers in the US and in France) that this is how they're producing EPUB files today.

If my understanding is correct (cc @laudrainhttps://github.com/laudrain and @liisamkhttps://github.com/liisamk) this is how they handle their EPUB production today:

Our current attempt in EPUB and in this group to use the same document for rendering and machine readable info still feels like a dead end IMO. No matter how we define an algorithm for extracting the TOC, we always reach a limit where things eventually fail. Publishers might also decide that they want a radically different navigation (for example a map as an SVG) or add a lot of additional information that should not be processed (@liisamkhttps://github.com/liisamk mentioned such examples at TPAC).

This makes me wonder if we're not trying to use the term TOC for two separate concepts:

While some publications may be able to use the same document for both purposes, I don't think that's true for all publications, far from it (as we can see with content produced by some of the largest trade publishers).

I've updated an earlier example to further illustrate how this could work:

The idea is fairly basic:

The UAs could potentially do the following:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/w3c/wpub/issues/350, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AZh2yeXHG1SNENkxC30TNMyDZRuUfjaSks5unshogaJpZM4X051d.

HadrienGardeur commented 6 years ago

Thanks @liisamk this is very helpful.

@laudrain @liisamk do you have examples that could be publicly shared with this group? This would be very helpful to further illustrate and understand this use case.

[...] It is typically identified in the guide with a type=”toc”

There's no equivalent currently for the guide and its various types in WP. If some of them are useful (which seems to be the case here), we would need to either identify or define equivalent rel values.

iherman commented 6 years ago

@HadrienGardeur,

I haven't looked into it yet but it's not clear that something else than doc-toc would be needed.

At the moment, the draft uses a two-step approach for the TOC (and the page list). First, the (PublicationList) entry in the reading order or the resources identifies a resource using the rel value, but the URL in the PublicationList does not contain a fragment. As a second step, the TOC or pagelist is identified through a required doc-toc (resp. doc-pagelist) value for the role attribute.

Ideally, a similar mechanism should be used to identify the rich navigation element, but, as @laudrain said, there is currently no ARIA value to be used. There may be two ways to move forward:

  1. we relax the current mechanism and allow for fragment URL-s to be used in PublicationLink (which may generate issues elsewhere, because we loose the feature whereby the same resource cannot be repeated in the union of the resource list and the reading order).
  2. we use a different identification mechanism to find the navigation element (e.g., use of a special class attribute value, brrrrrr:-(. However, I would argue that the same approach should be used for all three such 'identification', ie, TOC, pagelist, and navigation, ie, such changes may have to be retrofitted to the other two.

(I do not have a very strong opinion on which of these two should be used, just listing them.)

Looking ahead, it is also not clear what the WebIDL representation should be. At the moment, we use HTMLElement for the TOC and the pagelist as a direct attribute value (or, depending on the outcome of #338, as a return value for a helper method) but this would probably change to return the result of JuansAlgorithm. I do not presume the navigation would have such algorithm. (A possibility is to leave it as an HTMLElement, or not to define a helper function if that is decided in #338.)

HadrienGardeur commented 6 years ago

@iherman

I'm not entirely convinced that we need an ARIA role for the visually-rich use case. Currently in EPUB, there's no epub:type for it. It would probably be better in terms of accessibility, but we don't need it for the kind of additional semantics that we require from doc-toc (identifying the HTML element to extract its content).

Defining a rel value would be the equivalent of the guide property that currently exists in EPUB.

Looking ahead, it is also not clear what the WebIDL representation should be.

IMO nothing is required.

I've already said this previously, but duplicating information is not a good thing and since we're not considering making this document machine processable, we don't need a place to store the info once processed.

iherman commented 6 years ago

I'm not entirely convinced that we need an ARIA role for the visually-rich use case. Currently in EPUB, there's no epub:type for it. It would probably be better in terms of accessibility, but we don't need it for the kind of additional semantics that we require from doc-toc (identifying the HTML element to extract its content).

I actually do not necessarily care about accessibility in this case, only a 'sign' that signals that this is a visually rich entry. Please look at my summary of the current process (see above): the role of doc-toc is not used as an accessibility hook, "just" as identification. This is what I am talking about.

I've already said this previously, but duplicating information is not a good thing and since we're not considering making this document machine processable, we don't need a place to store the info once processed.

A visually navigation should be treated like ToC and and the page list. This means that, obviously, results of #338 are relevant for this entry as well. That issue is still open.

HadrienGardeur commented 6 years ago

I actually do not necessarily care about accessibility in this case, only a 'sign' that signals that this is a visually rich entry.

The rel values already handles that: https://github.com/HadrienGardeur/webpub-manifest/blob/gh-pages/examples/why/wp.json#L22

laudrain commented 6 years ago

@HadrienGardeur we have an epub:type for the visual table of content. In Hachette, for EPUB3 today, we use epub:type="toc" in the spine document with the visual TOC. This is not the nav document.

HadrienGardeur commented 6 years ago

@laudrain is there any valid reason not to use doc-toc then?

The two documents would still be properly identified by different rel values in the manifest.

laudrain commented 6 years ago

We map role="doc-toc" from epub:type="toc".

iherman commented 6 years ago

I actually do not necessarily care about accessibility in this case, only a 'sign' that signals that this is a visually rich entry.

The rel values already handles that: https://github.com/HadrienGardeur/webpub-manifest/blob/gh-pages/examples/why/wp.json#L22

Please look at https://github.com/w3c/wpub/issues/350#issuecomment-432508128. The current draft does not allow fragmented URI-s for resources, so the rel value is not enough. Or we have to relax that restriction overall.

iherman commented 6 years ago

@laudrain is there any valid reason not to use doc-toc then?

doc-toc has a specific meaning in ARIA. This may not be valid for the navigation document.


There were some discussions at TPAC and it may (I said MAY) be possible that a new, general purpose attribute would be introduced in HTML that does not have the ARIA baggage to it. We could then define terms to be used for that new attribute, and this may solve the issue.

No commitment, though.

HadrienGardeur commented 6 years ago

I'm sorry @iherman but I have a hard time following you on that one.

I'm not suggesting that we use a URI fragment. I simply believe that for "additional navigation" having:

... are more than enough.

laudrain commented 6 years ago

doc-toc has a specific meaning in ARIA. This may not be valid for the navigation document.

@iherman I'll check with my DAISY friends

iherman commented 6 years ago

@HadrianGardeur

Indeed, we seem to mutually not understanding one another, as it seems.

In my understanding, this is what you propose:

"resources" : [{
        ...
    },{
        "type":"PublicationLink",
        "url": "http://www.example.org/nav.html",
        "rel": "navigation"
    }],

What happens if there are several nav elements in nav.html? What happens if you have a TOC as well as visual rich navigation element in http://www.example.org/nav.html? How would you find the right navigation element?

Your proposal only works if http://www.example.org/nav.html is only used for (a single) visually rich navigation. I do not think this restriction is acceptable. Alternatively, you have to use http://www.example.org/nav.html#visuallyrichnavigation in the Publication Link. This is against the current draft requirement.

HadrienGardeur commented 6 years ago

@iherman

What happens if there are several nav elements in nav.html?

Since that document is meant to be rendered, I think that's not really an issue. From an affordance perspective, I think rendering the document is enough, no need to jump to a specific fragment id.

What happens if you have a TOC as well as visual rich navigation element in http://www.example.org/nav.html?

They're not semantically the same thing and IMO should be tied to different affordances as well.

In my first post, I suggested the following behavior:

  • provide the ability to jump to the "visually rich" navigation document
  • parse and use the content of the machine readable document in its own affordances
  • if the machine readable TOC is the only one present but can't be parsed, it would be rendered instead
  • if both are presents and the machine readable document can't be parsed, the visually rich document would be rendered instead

I'll let @laudrain and @liisamk chime in to make sure that this is aligned with their expectations.

How would you find the right navigation element?

I don't think that's necessary.

To summarize my proposal:

I think this is aligned with a real-world use case, as it's been expressed during TPAC by major trade publishers.

iherman commented 6 years ago

Well... in my view, they way you put it this is grossly underspecified. The only thing we specify is a resource that MAY include several visually rich navigation elements without specifying which one should be used by the UA as part of its specific (and obviously different) affordance for navigation, that MAY be present alongside the navigation meant for specific TOC which may lead to a possible confusion for the UA, etc.

You do not think it is necessary to specify it more precisely, I disagree. At this point we should agree that we disagree, and I let @laudrain, @llemeurfr, @liisamk, and the others decide whether such a loosely specified thing is useful or not.

laudrain commented 6 years ago

In the EPUB real world (the Hachette Livre one at least), we have and need:

There may be optional sub-tables at specific locations (parts) addressed in the main VRTOC, these are also in the reading order or included in the part documents.

IMO, I don't think useful for user experience to complexify that model with multiple VRTOC and multiple MRTOC.

llemeurfr commented 6 years ago

I'll try to put it differently: this issue seems to me similar to the cover-image vs cover-page issue. A printed book as a cover page, it has a ToC, both can be categorized as "visually rich". A WP-aware UA (i.e. a reading system) needs some machine processable data, as simple to process as possible: a cover image (optional) to be used into a catalog view or a book description; and a ToC (optional but recommended) to be displayed at a fingertip without leaving the current page.

The cover page and visually rich TOC are both usually accessed in reading order. The UA has no real need to get a specific reference to them.

Publishers tell us that this machine ToC is often different from the visually rich ToC (the former is often more complete than the latter). The UA needs a specific reference to the machine ToC and will render it with a UA specific layout (often in a panel). It will use the rel=contents solution and the corresponding structure will get an ARIA doc-toc attribute, because people using specific a11y tools need to access a complete ToC also.

I conclude that using rel=contents is sufficient for publishers and reading system needs, and we don't need to define any rel=navigation for help accessing the visually rich ToC.

And this should bring peace to @iherman and @HadrienGardeur :-)

Same for a page list and other landmarks.

HadrienGardeur commented 6 years ago

@iherman I'm trying to be very pragmatic here and simply stick to the requirements of the industry.

If @laudrain and @liisamk both believe that:

... then I'm happy to close this issue.

iherman commented 6 years ago

Obviously, if publishers do not really need a separate entry in the manifest, then I am fine with closing this...

avneeshsingh commented 6 years ago

Regarding aria doc-toc attribute, it is for both semantics and accessibility (until we have a separate semantics related attribute available in HTML). So, it should be present in both machine readable TOCs as well as visually rich navigation document (if we go on this path) to enable AT recognize that the document provides navigation structure.

iherman commented 6 years ago

This issue was discussed in a meeting.

iherman commented 6 years ago

Per resolution above, closing.