Closed iherman closed 6 years ago
@iherman aside from my comment in #338, I don't think that pagelist
and toc
can simply return an HTMLElement
.
Since they can both be contained in resources other than the entry page, we need to know the URIs of such resources as well. There's also the case of multiple resources marked as being the TOC and/or page liste as well, which requires extra attention.
Sidenote: we can usually preview such PRs but I'm not seeing any link for that right now. Is something broken in our pipeline?
I left this intentionally open for now, because the whole issue of ToC is still open as far as I am concerned, and until we have a clear idea of what we want, I do not want to touch the WebIDL. That should come when all other issues around ToC are closed.
For pagelist, I think we should, for now, simply keep it as a clone of ToC, so to say, and handle them together. This should not be done as part of this PR imho...
I think that the situation is quite different from last time. We were discussing whether the TOC could be machine friendly enough to result in a different representation in the WebIDL where it would already be parsed and ready to use for UAs.
In the current PR, there's a contradiction between:
rel
to identify a page list in resources
or readingOrder
This would be resolved by #338 if instead of HTMLElement
we implemented a getter (TBD), but even in this case, the getter wouldn't return an HTMLElement.
By the way, all of the previous comment is also true for the TOC and this is completely independent from any discussion about the machine readability of such documents.
I realize that I am jumping in here pretty late, but I would like to strongly recommendation against the pagelist as defined here (and in EPUB). I believe, based on various implementation considerations, that it is backwards from what is really desired to provide rich reflow<->static mappings.
Instead, individual content elements should identify their source pages (eg. data-source-page). Using this approach, a UA is able to not only take the user to the correct page ("Jump to Page") but the right spot on the page (eg. paragraph 4 on page 3). Additionally, it enables the UA to map between the two models in publications - something extremely useful when you want to share (web) annotations between a fixed layout and a reflowable version of the same publication.
How would this approach work in a non-WP aware user agent. It would be good to know the details.
@avneeshsingh great question! If we use data attributes, then the non-WP-aware UA would ignore them (since that's the rules - https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes). However, because they are "connected" with the content (being attributes), they could be used by any HTML UA, such as an assistive technology device, even if it wasn't WP aware.
@lrosenthol isn't that abusing / misusing custom data-xxx
attributes?
"User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values."
"Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements."
"These attributes are not intended for use by software that is not known to the administrators of the site that uses the attributes. For generic extensions that are to be used by multiple independent tools, either this specification should be extended to provide the feature explicitly, or a technology like microdata should be used (with a standardized vocabulary)."
"JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used."
etc. https://www.w3.org/TR/html52/dom.html#embedding-custom-non-visible-data-with-the-data-attributes
The pagelist is a list of links that provide navigation to positions in the content that correspond to the locations of page boundaries present in a print source being represented by the Web Publication
I am wondering whether it would be useful to clarify "print source" here, as this could be interpreted exclusively as "publications printed on paper, like traditional books or magazines", when this could in fact be referring to digital publications not designed to be fed through a printer at all (i.e. born-digital, stay-digital documents), such as a fixed-layout / pre-paginated EPUBs (or perhaps even PDFs) with interactive features, videos, animations, etc. I have seen a few examples of sales brochures, technical documentation, children's talking books, cooking / recipe books, etc. that do not have print equivalent, yet they are "paginated" works that would benefit from "pagelist" navigation in Web Publications.
@danielweck
isn't that abusing / misusing custom data-xxx attributes?
Probably...but HTML's limited model for attributes doesn't give us too much other choice..short of going to the HTML committee and asking for a new one. But I still think either of those is a better choice than creating a completely new grammar/language that requires custom parsing...
Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient.
I am not sure I understand the intent of this additional prose (which was inserted on top of the original proposal), and I am certainly not convinced by the claim "for making the navigation more convenient" which does not seem based upon a well-defined rationale.
Normally, we start with use cases / functional requirements, and if we agree on the needs, we translate that to well-specified / standardized mechanisms. With the additional prose I feel that we are introducing a tautological statement, and an imprecise one as well. Here is how I read the current prose: "page lists are important for publications that somehow have pages, but even if they don't, this feature is useful too because it provides navigation convenience").
Let's look at it from a different angle, and let's say that we're now debating the inclusion of EPUB's "list of landmarks" (e.g. figures, mathematical equations, etc.) in the Web Publications standard. Just like "pagelist", there would be a well-defined metadata vocabulary (or semantic role) that reading systems / user agents would use to discover the list of links identifying worthy locations within publication resources. It would be strange to state that "this feature can also be used for publications that do not actually contain figures, to make navigation more convenient".
I am sorry if I am missing the point :) Perhaps I am just nit-picking.
@danielweck : as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)
@danielweck maybe the discussion should move back to #223...
@danielweck maybe the discussion should move back to #223...
You mean the discussion about data-xxx
attributes, right? (+ @lrosenthol )
I think this PR is the right place to discuss the prose proposed in this PR :)
@lrosenthol granular anchoring is not the topic of discussion here. Please open a separate issue if you'd like to see that addressed.
@danielweck no. #223 is on any other navigation elements, including the pagelist. This PR implements the current resolution of including pagelist, but if we question this resolution, or want to extend it to other categories, the discussion should be done there, imho.
@TzviyaSiegman I disagree, as we are discussing the pagelist here...However, it appears that @iherman thinks that #223 would be a better spot and I am happy to move it there since both @danielweck and I have concerns about the currently proposed pagelist mechanism .
as I said on the call yesterday, I do need a text for the paragraph referring you, and I welcome any input:-)
@iherman
(1) I am proposing a clarification of "print source" as per my comment here ( https://github.com/w3c/wpub/pull/339#issuecomment-428262452 ) which I feel is neither an objection to the resolution, nor a request for extending the semantics of "page list" (perhaps the latter is where our interpretation differ, in which case I will be happy to move to #223 ).
(2) Regarding the additional sentence "Pagelist is important for Web Publications with print equivalent, however it may also be used for Web Publications which do not have print equivalent, for making the navigation more convenient", right now I am leaning towards removing it entirely, based on the reasons mentioned in comment https://github.com/w3c/wpub/pull/339#issuecomment-428269828 (I feel that the original prose clearly addresses the identified use-case / functional requirement, especially if the notion of "print source" is clarified)
I agree with @danielweck that the wording right now needs improvement. What about something like:
The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They might correspond to a statically paginated source, for example, such as a print document, or might be a purely digital creation to simplify access into the content.
How we provide the convenience of pagination in a digital world is a problem that needs a better solution some day, but abusing attributes that we know are just going to lead to non-implementation is not the way forward. It's not a problem we're likely to solve on WP's timeline.
I agree with @danielweck 's concerns
To put it differently: once the notion of "print source" is clarified to encompass any type of publication that intrinsically provides / contains a "list of pages" (thus the "pagelist" keyword / semantic role proposed here), such as "printed physical publications", "digital fixed-layout publications", and otherwise "reflowable publication with some sort of virtual pagination markers" (which I believe was the primary motivator for the additional prose), then we can remove the additional prose.
@mattgarrish I like the term "static" because I interpret its opposite meaning ("dynamic") as "pagination at rendering time, not baked into the content at authoring time". I am not sure everybody would use the same interpretation though.
I like the formulation in https://github.com/w3c/wpub/pull/339#issuecomment-428289554. I see only one caveat, though: this formulation makes it feasible to define several panelist-s, doesn’t it? I do not think this is a problem per se, but, if this is indeed the case:
PublicationLink
structure adding the name
field categorizing/describing what that particular panelist is used for (do we need some predefined terms to be put there?)we may suggest the usage of the
PublicationLink
structure adding thename
field categorizing/describing what that particular panelist is used for
Technically, there's no reason why you couldn't have more than one page list, but realistically coordinating the locations is usually only accomplished against one source (or using one "distance" method if there isn't a paginated source/equivalent).
I'd be fine with allowing more than one and using the name
field as you suggest, but I don't have a strong opinion on it.
With regards to the name, I can't think of a better alternative. We have page lists in EPUB and DPUB-ARIA, so there's consistency in comprehension if we don't alter the name. The concept of getting to a "page", even if the user doesn't fully understand the pagination method, is also broadly understood.
The first part of Matt's proposal is clear:
The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content.
The second part, using "might", is too generic, its blurs the notion: it's a page list
after all, not a random list of pointers. I would rephrase/simplify it as:
The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.
Plus, what is missing is the exact meaning of the page break in the content, vs start/end of the printed page. The Daisy page on page lists adds it in a Q&A section (=start).
Last, the representation of the page number in the content should IMO be specified also: EPUB is underspecified, which is an issue for UAs, and the Daisy page also gives details on best practices.
@llemeurfr
I think we should focus on mechanism in specs here, and leave the details of start or end of page to best practices. there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels. Some print equivalent may have page numbers marked at top of page, some may have it at bottom. DAISY 2 specs were precise about it, so production centers had to implement work arounds to match print equivalent.
Last, the representation of the page number in the content should IMO be specified also
Ideally, yes, but practically this can't be enforced without unwanted side-effects. You can have a page list for an audio book, for example, which would only reference into audio offsets. The page list itself is only dependent on having some location to link to.
@avneeshsingh, I had further thoughts about the start/end issue; UAs will follow a link in a page list and display content. Such link acts as a bookmark would: in reflow mode, the "synthetic page" (i.e. the screen) displayed will be the one the user would have accessed whilst moving from page to page from the start of the corresponding html resource. Therefore its impossible to know if the page break will be at the start, middle or end of this page, as it depends on the screen size.
@mattgarish, as it isn't tied to the current PR, I'll create a specific issue for the issue of page numbers as content or empty elements.
In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?
I'll create a specific issue for the issue of page numbers as content or empty elements.
Oh, okay, I misunderstood your request. One common issue is whether the numbers have to be present with the page list and the other is whether they have to be content or not. I can comment on the latter when you open the issue. :)
It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document.
The wording is a bit confusing, as it seems to mix authoring and user agent requirements. I read it as the pagelist/toc MUST be identified by the appropriate role, and user agents MUST recognize the first one so designated. It would probably be better to split these statements, if so.
In the proposed text I read [the pagelist nav] "MUST be the first element in the document so designated." It seems impossible to have both the ToC nav and the pagelige nav in the same entry page, both being the first element in the document. Or Do I miss something?
"so designated" includes the choice of the nav
based on the value of role
(doc-toc
vs. doc-panelist
). Ie, the choice of these two are disjoint, and they can both be in the same file.
@HadrienGardeur
In the current PR, there's a contradiction between:
- the spec language that allows the use of a rel to identify a page list in resources or readingOrder
- the WebIDL which would only work if the page list is contained in the entry page
I do not see why the second statement would be true. It is an HTMLElement, ie, a DOM object, and that can be in any HTML resource, whether it is the entry page or not.
But, again, this is the same as the ToC. Finalizing the ToC would fix this, too.
@iherman this means that in addition to having the DOM of the current document being displayed (for example chapter 1 instead of the entry page), you would need the DOM for potentially three additional documents as well:
This is quite heavy in terms of processing and I really don't think that's a good idea.
Trying to find a consensus on this PR, based on the texts above. I extract the following possible changes before merge:
The pagelist is a list of links that provides navigation to static page demarcation points within the content. These locations allow users, for example, to coordinate access into the content. The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added for the sake of easing navigation.
, and MUST be the first element in the document so designated.
should say
, and MUST be the first element in the document with that
role
value.
I have also looked at the issue on whether we would allow several pagelists. This requires some "stylistic" changes ("A pagelist" instead of "The pageslist") and (2) above is moot for pagelists. However, it also raises the question on how to locate them. We could say:
resources
or readingOrder
.Do we agree to have this? Is it important to have several pageslists, in fact? Or should we forget about this for now?
Additionally, some other questions arose, to be discussed elsewhere, but should not prevent merging this PR:
data-xxx
attributes to identify targetsAdditionally, it will follow all entries, if available, in
resources
orreadingOrder
.
Yikes. Why would we do that? The pagelist should be identified in resources
or readingOrder
using a rel
value.
@HadrienGardeur, on https://github.com/w3c/wpub/pull/339#issuecomment-428872262 ("you would need the DOM for potentially three additional documents as well"): that is unfortunately correct. But this is related to the (still open) issue #291. If that was closed, we could simplify the WebIDL in some way, possibly defining a helping function that returns a ToC/pageslist in some JSON format for further processing. But, until that issue is open, we are stuck.
@HadrienGardeur
Yikes. Why would we do that? The pagelist should be identified in resources or readingOrder using a rel value.
That is what I meant. If we have several pagelists, then each of them should be treated as we have defined for one.
Note that I do not have strong feelings whether we need this or not, ie, whether we should allow several pages lists or not. I was just exploring what this would mean.
@iherm The changes written by you are fine. Regarding multiple pagelist, I think it would be good to keep away from geting into to much details. We heard about a use case from Dave but we should know about more use cases to specify multiple pagelist properly.
there are many types of publications which would be having different implementations of page numbers. Some may be numeric, some may be alphabetic and some may have labels.
@avneeshsingh IMO this is exactly what makes it very difficult for UAs to implement good features based on page lists.
For example, as a UA developer I can't show a dialog that will let the user jump to page "9" because:
As a UA developer, I can only provide:
I frankly don't think that's enough for education where we need to easily share a location AND jump to it as well (teacher tells the class to read page 9).
In WP, we need to walk between flexibility and precision. This is why the best practices are so important. I would love to have it well defined that can address main use cases, but then it will restrict flexibility.
and somehow offer the ability to display the page number for what I'm currently viewing (as an overlay)
In the content, page breaks have to identified with semantic, for instance the role "doc-pagebreak".
Unless there is outcry, I intend to modify the PR today along the lines of https://github.com/w3c/wpub/pull/339#issuecomment-428881172, except that I would keep the unicity of pagelists (ie, there should be only zero or one), following the remarks of Avneesh.
@iherman Good for now. If the group members find that multiple page lists are important, it would be good to open a new issue for it. I anticipate that identifying multiple pagelists with different print sources (as mentioned by Dave) will need additional work, so it would be good to document the use cases for it before we expand pagelist to pagelists.
@mattgarrish I had to make some more changes to make the first-come-first-served rule:-) Please check. Maybe merge after this?
Looks good to me now.
This is the action recorded in minutes of 8th of October.
Some notes
If this PR is merged, then https://github.com/w3c/wp-vocab/pull/1 should also be merged and processed; this adds the extra
rel
value to our vocabulary files.Cc: @GarthConboy @llemeurfr @laudrain @HadrienGardeur
Fix #223
Preview | Diff