w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

Added the pagelist entry #339

Closed iherman closed 6 years ago

iherman commented 6 years ago

This is the action recorded in minutes of 8th of October.

Some notes

If this PR is merged, then https://github.com/w3c/wp-vocab/pull/1 should also be merged and processed; this adds the extra rel value to our vocabulary files.

Cc: @GarthConboy @llemeurfr @laudrain @HadrienGardeur

Fix #223


Preview | Diff

HadrienGardeur commented 6 years ago

@iherman aside from my comment in #338, I don't think that pagelist and toc can simply return an HTMLElement.

Since they can both be contained in resources other than the entry page, we need to know the URIs of such resources as well. There's also the case of multiple resources marked as being the TOC and/or page liste as well, which requires extra attention.

HadrienGardeur commented 6 years ago

Sidenote: we can usually preview such PRs but I'm not seeing any link for that right now. Is something broken in our pipeline?

iherman commented 6 years ago

I left this intentionally open for now, because the whole issue of ToC is still open as far as I am concerned, and until we have a clear idea of what we want, I do not want to touch the WebIDL. That should come when all other issues around ToC are closed.

For pagelist, I think we should, for now, simply keep it as a clone of ToC, so to say, and handle them together. This should not be done as part of this PR imho...

HadrienGardeur commented 6 years ago

I think that the situation is quite different from last time. We were discussing whether the TOC could be machine friendly enough to result in a different representation in the WebIDL where it would already be parsed and ready to use for UAs.

In the current PR, there's a contradiction between:

This would be resolved by #338 if instead of HTMLElement we implemented a getter (TBD), but even in this case, the getter wouldn't return an HTMLElement.

HadrienGardeur commented 6 years ago

By the way, all of the previous comment is also true for the TOC and this is completely independent from any discussion about the machine readability of such documents.

lrosenthol commented 6 years ago

I realize that I am jumping in here pretty late, but I would like to strongly recommendation against the pagelist as defined here (and in EPUB). I believe, based on various implementation considerations, that it is backwards from what is really desired to provide rich reflow<->static mappings.

Instead, individual content elements should identify their source pages (eg. data-source-page). Using this approach, a UA is able to not only take the user to the correct page ("Jump to Page") but the right spot on the page (eg. paragraph 4 on page 3). Additionally, it enables the UA to map between the two models in publications - something extremely useful when you want to share (web) annotations between a fixed layout and a reflowable version of the same publication.

avneeshsingh commented 6 years ago

How would this approach work in a non-WP aware user agent. It would be good to know the details.

lrosenthol commented 6 years ago

@avneeshsingh great question! If we use data attributes, then the non-WP-aware UA would ignore them (since that's the rules - https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes). However, because they are "connected" with the content (being attributes), they could be used by any HTML UA, such as an assistive technology device, even if it wasn't WP aware.

danielweck commented 6 years ago

@lrosenthol isn't that abusing / misusing custom data-xxx attributes?

"User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values."

"Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements."

"These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. For generic extensions that are to be used by multiple independent tools, either this specification should be extended to provide the feature explicitly, or a technology like microdata should be used (with a standardized vocabulary)."

"JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used."

etc. https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes

danielweck commented 6 years ago

The pagelist is a list of links that provide navigation to positions in the content that correspond to the locations of page boundaries present in a print source being represented by the Web Publication

I am wondering whether it would be useful to clarify "print source" here, as this could be interpreted exclusively as "publications printed on paper, like traditional books or magazines", when this could in fact be referring to digital publications not designed to be fed through a printer at all (i.e. born-digital, stay-digital documents), such as a fixed-layout / pre-paginated EPUBs (or perhaps even PDFs) with interactive features, videos, animations, etc. I have seen a few examples of sales brochures, technical documentation, children's talking books, cooking / recipe books, etc. that do not have print equivalent, yet they are "paginated" works that would benefit from "pagelist" navigation in Web Publications.

lrosenthol commented 6 years ago

@danielweck

isn't that abusing / misusing custom data-xxx attributes?

Probably...but HTML's limited model for attributes doesn't give us too much other choice..short of going to the HTML committee and asking for a new one. But I still think either of those is a better choice than creating a completely new grammar/language that requires custom parsing...

danielweck commented 6 years ago

Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient.

I am not sure I understand the intent of this additional prose (which was inserted on top of the original proposal), and I am certainly not convinced by the claim "for making the navigation more convenient" which does not seem based upon a well-defined rationale.

Normally, we start with use cases / functional requirements, and if we agree on the needs, we translate that to well-specified / standardized mechanisms. With the additional prose I feel that we are introducing a tautological statement, and an imprecise one as well. Here is how I read the current prose: "page lists are important for publications that somehow have pages, but even if they don't, this feature is useful too because it provides navigation convenience").

Let's look at it from a different angle, and let's say that we're now debating the inclusion of EPUB's "list of landmarks" (e.g. figures, mathematical equations, etc.) in the Web Publications standard. Just like "pagelist", there would be a well-defined metadata vocabulary (or semantic role) that reading systems / user agents would use to discover the list of links identifying worthy locations within publication resources. It would be strange to state that "this feature can also be used for publications that do not actually contain figures, to make navigation more convenient".

I am sorry if I am missing the point :) Perhaps I am just nit-picking.

iherman commented 6 years ago

@danielweck : as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

iherman commented 6 years ago

@danielweck maybe the discussion should move back to #223...

danielweck commented 6 years ago

@danielweck maybe the discussion should move back to #223...

You mean the discussion about data-xxx attributes, right? (+ @lrosenthol )

I think this PR is the right place to discuss the prose proposed in this PR :)

TzviyaSiegman commented 6 years ago

@lrosenthol granular anchoring is not the topic of discussion here. Please open a separate issue if you'd like to see that addressed.

iherman commented 6 years ago

@danielweck no. #223 is on any other navigation elements, including the pagelist. This PR implements the current resolution of including pagelist, but if we question this resolution, or want to extend it to other categories, the discussion should be done there, imho.

lrosenthol commented 6 years ago

@TzviyaSiegman I disagree, as we are discussing the pagelist here...However, it appears that @iherman thinks that #223 would be a better spot and I am happy to move it there since both @danielweck and I have concerns about the currently proposed pagelist mechanism .

danielweck commented 6 years ago

as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)

@iherman

(1) I am proposing a clarification of "print source" as per my comment here ( https://github.com/w3c/wpub/pull/339#issuecomment-428262452 ) which I feel is neither an objection to the resolution, nor a request for extending the semantics of "page list" (perhaps the latter is where our interpretation differ, in which case I will be happy to move to #223 ).

(2) Regarding the additional sentence "Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient", right now I am leaning towards removing it entirely, based on the reasons mentioned in comment https://github.com/w3c/wpub/pull/339#issuecomment-428269828 (I feel that the original prose clearly addresses the identified use-case / functional requirement, especially if the notion of "print source" is clarified)

mattgarrish commented 6 years ago

I agree with @danielweck that the wording right now needs improvement. What about something like:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They might correspond to a statically paginated source, for example, such as a print document, or might be a purely digital creation to simplify access into the content.

How we provide the convenience of pagination in a digital world is a problem that needs a better solution some day, but abusing attributes that we know are just going to lead to non-implementation is not the way forward. It's not a problem we're likely to solve on WP's timeline.

TzviyaSiegman commented 6 years ago

I agree with @danielweck 's concerns

  1. "print source" is a bit unclear. Perhaps we should phrase it as "paginated content" or simply remove the phrase. HTML can be paginated too.
  2. +1 to removing mention of when it might be helpful to use pagelist. We leave this as a MAY and allow the user to determine when to use it.
danielweck commented 6 years ago

To put it differently: once the notion of "print source" is clarified to encompass any type of publication that intrinsically provides / contains a "list of pages" (thus the "pagelist" keyword / semantic role proposed here), such as "printed physical publications", "digital fixed-layout publications", and otherwise "reflowable publication with some sort of virtual pagination markers" (which I believe was the primary motivator for the additional prose), then we can remove the additional prose.

danielweck commented 6 years ago

@mattgarrish I like the term "static" because I interpret its opposite meaning ("dynamic") as "pagination at rendering time, not baked into the content at authoring time". I am not sure everybody would use the same interpretation though.

iherman commented 6 years ago

I like the formulation in https://github.com/w3c/wpub/pull/339#issuecomment-428289554. I see only one caveat, though: this formulation makes it feasible to define several panelist-s, doesn’t it? I do not think this is a problem per se, but, if this is indeed the case:

mattgarrish commented 6 years ago

we may suggest the usage of the PublicationLink structure adding the name field categorizing/describing what that particular panelist is used for

Technically, there's no reason why you couldn't have more than one page list, but realistically coordinating the locations is usually only accomplished against one source (or using one "distance" method if there isn't a paginated source/equivalent).

I'd be fine with allowing more than one and using the name field as you suggest, but I don't have a strong opinion on it.

With regards to the name, I can't think of a better alternative. We have page lists in EPUB and DPUB-ARIA, so there's consistency in comprehension if we don't alter the name. The concept of getting to a "page", even if the user doesn't fully understand the pagination method, is also broadly understood.

llemeurfr commented 6 years ago

The first part of Matt's proposal is clear:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content.

The second part, using "might", is too generic, its blurs the notion: it's a page list after all, not a random list of pointers. I would rephrase/simplify it as:

The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.

Plus, what is missing is the exact meaning of the page break in the content, vs start/end of the printed page. The Daisy page on page lists adds it in a Q&A section (=start).

Last, the representation of the page number in the content should IMO be specified also: EPUB is underspecified, which is an issue for UAs, and the Daisy page also gives details on best practices.

avneeshsingh commented 6 years ago

@llemeurfr

I think we should focus on mechanism in specs here, and leave the details of start or end of page to best practices. there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels. Some print equivalent may have page numbers marked at top of page, some may have it at bottom. DAISY 2 specs were precise about it, so production centers had to implement work arounds to match print equivalent.

mattgarrish commented 6 years ago

Last, the representation of the page number in the content should IMO be specified also

Ideally, yes, but practically this can't be enforced without unwanted side-effects. You can have a page list for an audio book, for example, which would only reference into audio offsets. The page list itself is only dependent on having some location to link to.

llemeurfr commented 6 years ago

@avneeshsingh, I had further thoughts about the start/end issue; UAs will follow a link in a page list and display content. Such link acts as a bookmark would: in reflow mode, the "synthetic page" (i.e. the screen) displayed will be the one the user would have accessed whilst moving from page to page from the start of the corresponding html resource. Therefore its impossible to know if the page break will be at the start, middle or end of this page, as it depends on the screen size.

@mattgarish, as it isn't tied to the current PR, I'll create a specific issue for the issue of page numbers as content or empty elements.

llemeurfr commented 6 years ago

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

mattgarrish commented 6 years ago

I'll create a specific issue for the issue of page numbers as content or empty elements.

Oh, okay, I misunderstood your request. One common issue is whether the numbers have to be present with the page list and the other is whether they have to be content or not. I can comment on the latter when you open the issue. :)

mattgarrish commented 6 years ago

It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document.

The wording is a bit confusing, as it seems to mix authoring and user agent requirements. I read it as the pagelist/toc MUST be identified by the appropriate role, and user agents MUST recognize the first one so designated. It would probably be better to split these statements, if so.

iherman commented 6 years ago

In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?

"so designated" includes the choice of the nav based on the value of role (doc-toc vs. doc-panelist). Ie, the choice of these two are disjoint, and they can both be in the same file.

iherman commented 6 years ago

@HadrienGardeur

In the current PR, there's a contradiction between:

  • the spec language that allows the use of a rel to identify a page list in resources or readingOrder
  • the WebIDL which would only work if the page list is contained in the entry page

I do not see why the second statement would be true. It is an HTMLElement, ie, a DOM object, and that can be in any HTML resource, whether it is the entry page or not.

But, again, this is the same as the ToC. Finalizing the ToC would fix this, too.

HadrienGardeur commented 6 years ago

@iherman this means that in addition to having the DOM of the current document being displayed (for example chapter 1 instead of the entry page), you would need the DOM for potentially three additional documents as well:

This is quite heavy in terms of processing and I really don't think that's a good idea.

iherman commented 6 years ago

Trying to find a consensus on this PR, based on the texts above. I extract the following possible changes before merge:

  1. The text in 3.8 should become:

The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.

  1. in 3.7 and 3.8 part of the sentence

, and MUST be the first element in the document so designated.

should say

, and MUST be the first element in the document with that role value.


I have also looked at the issue on whether we would allow several pagelists. This requires some "stylistic" changes ("A pagelist" instead of "The pageslist") and (2) above is moot for pagelists. However, it also raises the question on how to locate them. We could say:

  1. The UA tries to locate pagelists in the primary entry page
  2. Additionally, it will follow all entries, if available, in resources or readingOrder.

Do we agree to have this? Is it important to have several pageslists, in fact? Or should we forget about this for now?


Additionally, some other questions arose, to be discussed elsewhere, but should not prevent merging this PR:

HadrienGardeur commented 6 years ago

Additionally, it will follow all entries, if available, in resources or readingOrder.

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

iherman commented 6 years ago

@HadrienGardeur, on https://github.com/w3c/wpub/pull/339#issuecomment-428872262 ("you would need the DOM for potentially three additional documents as well"): that is unfortunately correct. But this is related to the (still open) issue #291. If that was closed, we could simplify the WebIDL in some way, possibly defining a helping function that returns a ToC/pageslist in some JSON format for further processing. But, until that issue is open, we are stuck.

iherman commented 6 years ago

@HadrienGardeur

Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.

That is what I meant. If we have several pagelists, then each of them should be treated as we have defined for one.

Note that I do not have strong feelings whether we need this or not, ie, whether we should allow several pages lists or not. I was just exploring what this would mean.

avneeshsingh commented 6 years ago

@iherm The changes written by you are fine. Regarding multiple pagelist, I think it would be good to keep away from geting into to much details. We heard about a use case from Dave but we should know about more use cases to specify multiple pagelist properly.

HadrienGardeur commented 6 years ago

there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels.

@avneeshsingh IMO this is exactly what makes it very difficult for UAs to implement good features based on page lists.

For example, as a UA developer I can't show a dialog that will let the user jump to page "9" because:

As a UA developer, I can only provide:

I frankly don't think that's enough for education where we need to easily share a location AND jump to it as well (teacher tells the class to read page 9).

avneeshsingh commented 6 years ago

In WP, we need to walk between flexibility and precision. This is why the best practices are so important. I would love to have it well defined that can address main use cases, but then it will restrict flexibility.

laudrain commented 6 years ago

and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)

In the content, page breaks have to identified with semantic, for instance the role "doc-pagebreak".

iherman commented 6 years ago

Unless there is outcry, I intend to modify the PR today along the lines of https://github.com/w3c/wpub/pull/339#issuecomment-428881172, except that I would keep the unicity of pagelists (ie, there should be only zero or one), following the remarks of Avneesh.

avneeshsingh commented 6 years ago

@iherman Good for now. If the group members find that multiple page lists are important, it would be good to open a new issue for it. I anticipate that identifying multiple pagelists with different print sources (as mentioned by Dave) will need additional work, so it would be good to document the use cases for it before we expand pagelist to pagelists.

iherman commented 6 years ago

@mattgarrish I had to make some more changes to make the first-come-first-served rule:-) Please check. Maybe merge after this?

mattgarrish commented 6 years ago

Looks good to me now.