Information content of the abstract manifest

dauwhe commented 7 years ago

From @dauwhe on June 27, 2017 14:33

What information is required for an abstract manifest? [edited to add items from comments]

An identifier for the web publication, which should be a URL
Some way of saying that this URL represents a web publication.
Some way of identifying the constituent resources of the web publication.
Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)
Locating table of contents or other navigation structure

What else? I think we should distinguish required information from "nice to have" information.

Copied from original issue: w3c/publ-wg#12

TzviyaSiegman commented 7 years ago

For the sake of clarity, can we separate discussions of the abstract manifest (here) and the nav (https://github.com/w3c/wpub/issues/2)?

As @GarthConboy said:

This statement is true due to both RS functional requirements and A11Y ones, and doesn't presume to answer the HTML or JSON or XML format question or the in or out of manifest question. Just, that we must have a mechanism to support this functionality. I kinda hope this is not too arguable.

Let's not get too caught up in this level of detail at this point. We are not making these decisions yet. This does not even need to be in FPWD. We need to decide whether we need a manifest and whether it is distinct from nav.

HadrienGardeur commented 7 years ago

Seems like this is a limitation of the extraction process rather than the source format.

No, it's not:

first of all, it's much more difficult to extract such structured information from HTML than JSON. In most languages you'll need to rely on a third party library and write more code than if you were just parsing JSON.
even if you end up extracting HTML snippets instead of plain text, using that HTML in your UI is going to be much more difficult
there's no way ever, someone would also extract CSS in addition to the HTML snippet
once you have HTML instead of plain text, you'll most likely need to whitelist your HTML since you can't allow every HTML element to be displayed in your native UI
most of the time you won't be able to use this HTML directly in your native UI anyway. If you're lucky, there might be some helper that will be able to convert a very very small subset of what's in your initial HTML snippet
finally, you're much more likely to encounter issues with how your UI looks if you use HTML for labels instead of plain text. Expect plenty of additional hacks to make it look somehow OK.

Basically, it's impractical to use HTML for anything that won't be rendered as-is in a webview by the UA. Even for a Web App, it could be very painful to work with HTML instead of plain text for such strings.

This is true for navigation, but also for metadata since some of you have argued that HTML would also be a good fit for metadata. That's why using HTML for the manifest is also a bad idea, it makes working with structured content worse and doesn't provide the benefits that you're pointing to.

lrosenthol commented 7 years ago

On Thu, Jul 6, 2017 at 1:00 PM, Hadrien Gardeur notifications@github.com wrote:

Basically, it's impractical to use HTML for anything that won't be rendered in a webview by the UA.

You make this sound like a bad thing?!?! IMO, this is exactly what we want. If we believe that a UA is going to present a customized UI for a (P)WP - then we have already lost :(

HadrienGardeur commented 7 years ago

You make this sound like a bad thing?!?! IMO, this is exactly what we want. If we believe that a UA is going to present a customized UI for a (P)WP - then we have already lost :(

I don't think it's a bad thing, simply drawing a line between the case where HTML is helpful vs harmful/impractical.

Surprisingly, I think that I mostly agree with you about WP and UAs for it:

a normal browser (what we have available today, not in 2020) should be able to render and navigate a WP without any sort of plugin/add-on
a WP optimized UA should only be a minimal layer on top of a browser/webview
all structured information about a publication should be included in the manifest
the UA can rely on this structured information for preloading, caching or whatever's necessary for its minimal UI on top of a browser/webview
if a UA for a WP detects a navigation document in the manifest, it should render it as-is without trying to extract structured information or rely on a custom UI for it

lrosenthol commented 7 years ago

Works for me!

On Thu, Jul 6, 2017 at 2:05 PM, Hadrien Gardeur notifications@github.com wrote:

You make this sound like a bad thing?!?! IMO, this is exactly what we want. If we believe that a UA is going to present a customized UI for a (P)WP - then we have already lost :(

I don't think it's a bad thing, simply drawing a line between the case where HTML is helpful vs harmful/impractical.

Surprisingly, I think that I mostly agree with you about WP and UAs for it:

a normal browser (what we have available today, not in 2020) should be able to render and navigate a WP without any sort of plugin/add-on

a WP optimized UA should only be a minimal layer on top of a browser/webview

all structured information about a publication should be included in the manifest

the UA can rely on this structured information for preloading, caching or whatever's necessary for its minimal UI on top of a browser/webview

if a UA for a WP detects a navigation document in the manifest, it should render it as-is without trying to extract structured information or rely on a custom UI for it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/6#issuecomment-313474275, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNcuXaIcswpL1rUjOTS-BJ4NIDDcmks5sLSHugaJpZM4OOnGN .

TzviyaSiegman commented 7 years ago

I'm so glad to see that @lrosenthol and @hadrien have reached consensus. I think the point of this issue was to assess what a minimum viable Manifest is.

When working on EPUB 3.1, we sent a survey to anyone who was willing to respond asking which metadata elements were used, which were essential. Here are the anonymized results. Much of this is not relevant to manifest, and we uncovered more information as we worked out details. For example, we discovered that language was essential. I believe @laudrain can comment further on that.

@dauwhe, perhaps we should close this issue and take each component of manifest that you proposed as its own issue?

HadrienGardeur commented 7 years ago

@TzviyaSiegman the requirements for a minimum viable manifest is a slightly different question than listing all the things that we potentially need in a manifest (which is what most of the comments on this issue focused on).

In Readium we have three requirements for our manifest:

a title
a self/canonical link
at least one resource listed in the spine (linear reading order)

In the case of a WP manifest, if we consider single resource publications, the first two are still IMO useful but we might not need to have a spine at all.

For publications with multiple resources, these three requirements are still a good starting point for a minimal viable manifest.

lrosenthol commented 7 years ago

I would say that only the self link is the only requirement.

There are many documents (one class of publication) that don't have titles, so it doesn't make sense to have that as a requirement. (we learned this the hard way with PDF, when we tried ot mandate that, and people just put " " in to pass validation)

On Thu, Jul 6, 2017 at 3:36 PM, Hadrien Gardeur notifications@github.com wrote:

@TzviyaSiegman https://github.com/tzviyasiegman the requirements for a minimum viable manifest is a slightly different question than listing all the things that we potentially need in a manifest (which is what most of the comments on this issue focused on).

In Readium we have three requirements for our manifest:

a title

a self/canonical link

at least one resource listed in the spine (linear reading order)

In the case of a WP manifest, if we consider single resource publications, the first two are still IMO useful but we might not need to have a spine at all.

For publications with multiple resources, these three requirements are still a good starting point for a minimal viable manifest.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/6#issuecomment-313497461, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNSKzFpr7D6d7ilo-rjc2EsaayCytks5sLTdYgaJpZM4OOnGN .

mattgarrish commented 7 years ago

@llemeurfr raised that a web origin is optional, which will be true at the packaged/distributable level.

What self url is used in that case, or is a packaged web pub not always a valid web pub?

GarthConboy commented 7 years ago

Referring back to Hadrien's list of a few hours ago:

a normal browser (what we have available today, not in 2020) should be able to render and navigate a WP without any sort of plugin/add-on
a WP optimized UA should only be a minimal layer on top of a browser/webview
all structured information about a publication should be included in the manifest
the UA can rely on this structured information for preloading, caching or whatever's necessary for its minimal UI on top of a browser/webview
if a UA for a WP detects a navigation document in the manifest, it should render it as-is without trying to extract structured information or rely on a custom UI for it

I can likely buy the first four (especially #3), but not the last one. The main idea of a navigation document is to provide structured information to the RS, such that interesting, helpful, and accessible UI's can be created with the information. I would like to see a way of containing said information in the manifest. Perhaps optionally for WP's and perhaps required for PWP's (or at least some profiles of PWP). [yes, I realize that would make not all WP's convertible to PWP's or EPUB4's, but we may end up there for other reasons, e.g., A11Y]

HadrienGardeur commented 7 years ago

The main idea of a navigation document is to provide structured information to the RS, such that interesting, helpful, and accessible UI's can be created with the information. I would like to see a way of containing said information in the manifest.

@GarthConboy I would like that too. I simply believe that the manifest is a better place for that than the NavDoc and that we can't expect to have such a requirement for a WP (my comment was about WP, not PWP/EPUB 4).

@llemeurfr raised that a web origin is optional, which will be true at the packaged/distributable level. What self url is used in that case, or is a packaged web pub not always a valid web pub?

@mattgarrish that's a very good question. For a PWP that was generated from a WP, there's an easy answer: the self/canonical link should remain the same.

For a PWP that doesn't exist on the Web at all, we can't really use a URL and might have to rely on a URN (ISBN, UUID) instead.

mattgarrish commented 7 years ago

we can't really use a URL and might have to rely on a URN

Right, that's what has me wondering. I don't know that there's a better option, but the publication loses any real sense of self at that point, since every copy has an equal claim.

It's maybe a bit of an abuse of a self, but that's probably a technical niggle.

HadrienGardeur commented 7 years ago

It's maybe a bit of an abuse of a self, but that's probably a technical niggle.

Well I'll repeat myself but according to RFC5988 the definition for "self" is:

Conveys an identifier for the link's context.

A URN would still convey an identifier for a link's context IMO.

mattgarrish commented 7 years ago

Yes, it's vague enough to allow a urn as the self, which is why I said only "maybe". 4287 is more direct:

The value "self" signifies that the IRI in the value of the href attribute identifies a resource equivalent to the containing element.

I don't imagine this case was on anyone's radar when they minted it.

At any rate, I'm just thinking this might lead to some strange wording if we have to allow URNs in a specification that expects the canonical manifest to live at the referenced link and wants URLs. But these are details for another day.

HadrienGardeur commented 7 years ago

Well the other option is simply to have different basic requirements for WP and EPUB 4.

Using the Readium WebPub Manifest syntax, this is what it could look like:

Basic requirements for WP

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita"
},
"links": [
  {"rel": "self", "href": "http://example.com/manifest.json", "type": "application/webpub+json"}
],
"spine": [
  {"href": "http://example.com/chapter1", "type": "text/html"}
]

Basic requirements for EPUB4

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita",
  "identifier": "urn:isbn:9780141180144"
},
"spine": [
  {"href": "chapter1.html", "type": "text/html"}
]

lrosenthol commented 7 years ago

For WP, I'd like to avoid using any terms from EPUB (such as spine) as I think t will cause some web folks to look at us strangely.

In looking over the sample you provided (thanks for putting something up!), I would think that the primary (root?) item should be in links. That way a single file WP/PWP (or one with native navigation) wouldn't need a 'spine' at all.

On Thu, Jul 6, 2017 at 8:20 PM, Hadrien Gardeur notifications@github.com wrote:

Well the other option is simply to have different basic requirements for WP and EPUB 4.

Using the Readium WebPub Manifest syntax, this is what it could look like:

Basic requirements for WP

"@context": "http://readium.org/webpub/default.jsonld","metadata": { "title": "The Master and Margarita" },"links": [ {"rel": "self", "href": "http://example.com/manifest.json", "type": "application/webpub+json"} ],"spine": [ {"href": "http://example.com/chapter1", "type": "text/html"} ]

Basic requirements for EPUB4

"@context": "http://readium.org/webpub/default.jsonld","metadata": { "title": "The Master and Margarita", "identifier": "urn:isbn:9780141180144" },"spine": [ {"href": "chapter1.html", "type": "text/html"} ]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/6#issuecomment-313553791, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNdBzk_vcH6jOO2V0DUM5tF8pxeq5ks5sLXm5gaJpZM4OOnGN .

GarthConboy commented 7 years ago

My:

The main idea of a navigation document is to provide structured information to the RS, such that interesting, helpful, and accessible UI's can be created with the information. I would like to see a way of containing said information in the manifest.

And @HadrienGardeur reply:

@GarthConboy I would like that too. I simply believe that the manifest is a better place for that than the NavDoc and that we can't expect to such a requirement for a WP (my comment was about WP, not PWP/EPUB 4).

Indeed, agree. Quoting you "all structured information about a publication should be included in the manifest" -- agree.

And, Leonard, if EPUB has a fine name (like "spine"), I'm not sure we need to change it just to be different. :-)

mattgarrish commented 7 years ago

@hadrien Making self work across the board would be better for clean inheritance of must rules.

All I can think is possible spec wording like: "The self link must be either a URL or URN. Web Publications with an origin on the web must reference a URL that identifies that origin. Web Publications without a web origin must reference a URN that uniquely identifies the publication." (Not to be confused for real spec prose at this stage, and suffers from being completely unenforceable.)

It's a bit weird to have to address this without addressing packaging, but maybe this isn't exclusive to packaging. Is an unpackaged web publication on my desktop a web publication if I open the file url in a browser or reading system, or are there limits to http(s)? (i.e., what if the source of a publication isn't distributed packaged, like downloading samples out of github)

HadrienGardeur commented 7 years ago

In looking over the sample you provided (thanks for putting something up!), I would think that the primary (root?) item should be in links. That way a single file WP/PWP (or one with native navigation) wouldn't need a 'spine' at all.

@lrosenthol the example above would be for an external manifest file, not a single resource publication.

For a single resource publication it could look something like that:

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita"
},
"links": [
  {"rel": "self", "href": "http://example.com/publication", "type": "text/html"}
]

Key differences:

different media type for self link, since we reference HTML directly instead of a manifest format
no need for a spine

All I can think is possible spec wording like: "The self link must be either a URL or URN. Web Publications with an origin on the web must reference a URL that identifies that origin. Web Publications without a web origin must reference a URN that uniquely identifies the publication." (Not to be confused for real spec prose at this stage, and suffers from being completely unenforceable.)

@mattgarrish IMO there's a real use case for having two separate elements, and if a WP has an ISBN, we'll need it anyway. Agree that if we can align must requirements between WP/PWP/EPUB 4 it would be better, but I'm not entirely sure that'll be possible.

It's a bit weird to have to address this without addressing packaging, but maybe this isn't exclusive to packaging. Is an unpackaged web publication on my desktop a web publication if I open the file url in a browser or reading system, or are there limits to http(s)? (i.e., what if the source of a publication isn't distributed packaged, like downloading samples out of github)

I don't think that this is tied to packaging. The important question is whether a publication has a presence on the Web or not.

We could have packaged publications where the manifest or the package is also accessible on the Web, in that case the self link would also be included in a packaged version.

lrosenthol commented 7 years ago

What's the self link for an ad-hoc publication?

Consider a user today writing something in Google Docs and currently saving as EPUB. "Tomorrow", that's an EPUB4 - so what does Google put in the self? The URL to the .gdoc file? It's not an ISBN, of course. Is it just some URN with a GUID?

On Fri, Jul 7, 2017 at 4:23 AM, Hadrien Gardeur notifications@github.com wrote:

In looking over the sample you provided (thanks for putting something up!), I would think that the primary (root?) item should be in links. That way a single file WP/PWP (or one with native navigation) wouldn't need a 'spine' at all.

@lrosenthol https://github.com/lrosenthol the example above would be for an external manifest file, not a single resource publication.

For a single resource publication it could look something like that:

"@context": "http://readium.org/webpub/default.jsonld","metadata": { "title": "The Master and Margarita" },"links": [ {"rel": "self", "href": "http://example.com/publication", "type": "text/html"} ]

Key differences:

different media type for self link, since we reference HTML directly instead of a manifest format

no need for a spine

All I can think is possible spec wording like: "The self link must be either a URL or URN. Web Publications with an origin on the web must reference a URL that identifies that origin. Web Publications without a web origin must reference a URN that uniquely identifies the publication." (Not to be confused for real spec prose at this stage, and suffers from being completely unenforceable.)

@mattgarrish https://github.com/mattgarrish IMO there's a real use case for having two separate elements, and if a WP has an ISBN, we'll need it anyway. Agree that if we can align must requirements between WP/PWP/EPUB 4 it might be better, but I'm not entirely sure that'll be possible.

It's a bit weird to have to address this without addressing packaging, but maybe this isn't exclusive to packaging. Is an unpackaged web publication on my desktop a web publication if I open the file url in a browser or reading system, or are there limits to http(s)? (i.e., what if the source of a publication isn't distributed packaged, like downloading samples out of github)

I don't think that this is tied to packaging. The important question is whether a publication has a presence on the Web or not.

We could have packaged publications where the manifest or the package is also accessible on the Web, in that case the self link would also be included in a packaged version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/6#issuecomment-313619938, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNRHc0hhDxU9ylYlQTSj-EWoAKlaHks5sLesQgaJpZM4OOnGN .

lrosenthol commented 7 years ago

On Fri, Jul 7, 2017 at 4:23 AM, Hadrien Gardeur notifications@github.com wrote:

For a single resource publication it could look something like that:

"@context": "http://readium.org/webpub/default.jsonld","metadata": { "title": "The Master and Margarita" },"links": [ {"rel": "self", "href": "http://example.com/publication", "type": "text/html"} ]

Key differences:

different media type for self link, since we reference HTML directly instead of a manifest format

no need for a spine

@HadrienGardeur https://github.com/hadriengardeur: Do you see that JSON then ending up inside the HTML itself, for that single resource?

HadrienGardeur commented 7 years ago

cc @lrosenthol

Do you see that JSON then ending up inside the HTML itself, for that single resource?

Yes, inside a script element, just like JSON-LD in AMP.

What's the self link for an ad-hoc publication? Consider a user today writing something in Google Docs and currently saving as EPUB. "Tomorrow", that's an EPUB4 - so what does Google put in the self? The URL to the .gdoc file? It's not an ISBN, of course. Is it just some URN with a GUID?

I'm not sure what you mean by "ad-hoc publication". If that publication is available on the Web, then it should be a link to the single resource publication (HTML) or the manifest for a multiple resources publication.

If that publication is never published on the Web and a package is generated, it should probably be a UUID (urn:uuid:5377d964-97b0-4e6c-92f8-97482f17ffdf for example), but I don't think that we should use a link in that case (see my response to @mattgarrish above).

GarthConboy commented 7 years ago

I'm going to try to return to the high level of what do we need an a manifest -- ignoring encoding and serialization. This is a slightly longer version of some fodder I provided to Tzviya as potential discussion points for next Monday's meeting (during which I'll hopefully be able to lurk on IRC from a plane).

Anyhow, starting with Dave's initial post on this issue, where he pulled together a possible content list from the issue's earlier incarnation:

Identifier of WP.
Identification as a WP
List of publication resources.
Reading order.
Metadata.
Nav doc.

Okay, that was easy, but what's required and who points to whom? My opinions are as follows:

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]
Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).
List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).
Reading order. Required (with similar degenerate case comments as above).
Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.
Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

I think the latter four of the above should be in a "manifest" (encoded some way somewhere). I lean toward external from the first markup file in the reading order, but pointed to from there. Though an argument could be made for included in the first markup file in the reading order.

If we could come to an agreement at this level of specificity, I'd count it as substantial progress... then we could start the arguments on locations and encodings, but that could be done in the context of a high level agreement.

dauwhe commented 7 years ago

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

Yes. I think this is a critical point. The web should "just work". A user agent that does not know anything about web publications should still allow a user to read the darned book!

Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).

I kind of like how AMP and the new web packaging format identify themselves. Simple and expressive. Therefore I propose:

<html 📖>

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

Sounds good.

Reading order. Required (with similar degenerate case comments as above).

Sounds good.

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

Agreed that ability to specify metadata is required. From above we already know the identifier of the publication and that it's a web publication.

Do we require a web publication has a title?

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

If there is a nav doc, it must be identified here. Whether we require one is going to be one of our hardest issues, and gets to the heart of the difference between EPUB and the web.

lrosenthol commented 7 years ago

On Tue, Jul 11, 2017 at 1:46 PM, Dave Cramer notifications@github.com wrote:

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

Yes. I think this is a critical point. The web should "just work". A user agent that does not know anything about web publications should still allow a user to read the darned book!

@dauwhe, I agree with your statement. But I am still conflicted on what the identifier is. In that I see benefit to it being the URL to the manifest and not to the first document...(but we can work that out as we go...)

I kind of like how AMP and the new web packaging format identify themselves. Simple and expressive. Therefore I propose:

<html 📖>

I do too! However, I wouldn't use a book :)

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

Sounds good.

Maybe. I've been thinking more about how resources are really connected to the content page and not to the publication. Do we (well, the UA) really need (to load) the full 1000 images used by publication up front? Not necessarily - it may only want/need what is required to load the first content document.

We should be sure to design with optimization and performance in mind from the start...

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

Agreed that ability to specify metadata is required. From above we already know the identifier of the publication and that it's a web publication.

Do we require a web publication has a title?

I think we all agree we need a way to specify metadata, but what is (and is not) mandatory is going to be fun set of discussions.

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

If there is a nav doc, it must be identified here. Whether we require one is going to be one of our hardest issues, and gets to the heart of the difference between EPUB and the web.

Agreed.

I would also add that I am not sure that all of the above things we've identified as being in the "manifest" actually have to be part of it "natively" vs. being referenced (in sort of the same way you talk about a "nav doc".

mattgarrish commented 7 years ago

But, the presence of a machine-readable Nav Doc has many plusses.

Does it at the WP level? If this is content living on the web, do we expect special behaviours from the UA, or will it just treat the document like another web page and load it or pop it open in an overlay or window?

We lose consistent appearance and presentation, but for ease of conformance maybe that's not such a bad thing. It could also lead to some poorly accessible table of contents, but RS-generated table of contents aren't always that accessible now (e.g., flattening the structure for AT users).

If we make machine-processable rules at the epub 4 level, it wouldn't invalidate a free-form navigation document at the WP level.

pkra commented 7 years ago

2cts beyond my "voting".

@lrosenthol wrote:

Do we (well, the UA) really need (to load) the full 1000 images used by publication up front? Not necessarily - it may only want/need what is required to load the first content document.

I would assume that requiring a list of resources does not mean all resources must be downloaded right away (or even at any point). I would also not expect this list to always be "really" complete, i.e., some resources might be optional, loaded if possible (e.g., webfonts).

GarthConboy commented 7 years ago

But, the presence of a machine-readable Nav Doc has many plusses.

Does it at the WP level? If this is content living on the web, do we expect special behaviours from the UA, or will it just treat the document like another web page and load it or pop it open in an overlay or window?

In answer to "Does it at the WP level?" I think the answer could well be "yes." I bet there will WP savvy UA's and UA's that aren't savvy and just do was well as they can (thus my preference for the "identifier" to be the URL of the first document in the reading order). So some UA's could do something nice with a machine-readable Nav Doc, but others might completely ignore it (but it would still do no harm as an option piece of a WP).

lrosenthol commented 7 years ago

I didn't mean that the UA had to load the actual resources - but it would have to load the entire list (if we had a single list). And just a list of 1000 URLs would take time (and memory) to load and build into JS structures.

On Tue, Jul 11, 2017 at 3:34 PM, Peter Krautzberger < notifications@github.com> wrote:

2cts beyond my "voting".

@lrosenthol https://github.com/lrosenthol wrote:

Do we (well, the UA) really need (to load) the full 1000 images used by publication up front? Not necessarily - it may only want/need what is required to load the first content document.

I would assume that requiring a list of resources does not mean all resources must be downloaded right away (or even at any point). I would also not expect this list to always be "really" complete, i.e., some resources might be optional, loaded if possible (e.g., webfonts).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/wpub/issues/6#issuecomment-314548759, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNY7gflFEZgwl3Aa8vAzDLKr36o7sks5sM85bgaJpZM4OOnGN .

pkra commented 7 years ago

I didn't mean that the UA had to load the actual resources - but it would have to load the entire list

Ah, I had totally misunderstood you -- thanks for correcting me, @lrosenthol!

lrosenthol commented 7 years ago

If we have a "reading order' list, do we also need a "nav doc"? What is the (perceived or practical) difference - in the current EPUB world?

mattgarrish commented 7 years ago

Reading order is the list of files in sequence in which they're presented (the spine). Navigation document contains the table of contents (plus page list and landmarks).

Even if the spine documents were titled, you'd only get a rudimentary idea of the document outline from them, as when you factor in content chunking it won't even be clear what rank the headings have (i.e., not every document has to start with a level 1 heading).

lrosenthol commented 7 years ago

Thanks @mattgarrish

On Tue, Jul 11, 2017 at 5:31 PM, Matt Garrish notifications@github.com wrote:

Reading order is the list of files in sequence in which they're presented (the spine).

On way in which they can be presented, you mean. Since a user may choose to navigate content in other orders...

Navigation document contains the table of contents (plus page list and landmarks).

So this, to me, is actual content - and doesn't require a special place. If your content requires such a thing - then build it (with HTML & CSS, hopefully also with dpub-aria's TOC role) as a content element and add it to the reading order where you think it belongs.

mattgarrish commented 7 years ago

On way in which they can be presented, you mean.

The reading order as defined by the spine isn't dynamic, even if the reader follows a non-linear path through it. I don't see how that is easily changed, unless the UA understands the content at some deeper level.

At any rate, even if the reading order were shuffled it doesn't change the limitations as a means of navigating the actual publication outline.

add it to the reading order where you think it belongs

PDF has bookmarks. Word has the ability to view the document outline. EPUB has the navigation document. Do we want WP to be the outlier without a programmatic method of discovering?

lrosenthol commented 7 years ago

On Tue, Jul 11, 2017 at 6:21 PM, Matt Garrish notifications@github.com wrote:

PDF has bookmarks.

Which are a problem, for all the same reasons that EPUB's NavDoc is.

Lack of styling

Gets out of sync with the actual content (during editing/combining operations)

etc.

Word has the ability to view the document outline.

Which is dynamic based on styling (aka fake semantics) - much like the HTML outline algo.

EPUB has the navigation document. Do we want WP to be the outlier without a programmatic method of discovering?

I think that WP should be aligned with the web and not special.

HadrienGardeur commented 7 years ago

Thanks @GarthConboy for your proposal, I'll also reply point by point.

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

I disagree about this, for several reasons:

The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.
What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.
A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it (<link> in HTML, Link: in HTTP).

Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).

What would you like to identify as a WP? If we're talking about a resource from the publication, it should be identified by either the presence of a link to a manifest, or because the UA has already accessed a manifest and knows that the resource is part of a publication.

For the manifest itself, it should be identified by a specific media type.

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

In Readium WebPub Manifest we also opted for a requirement, at least one resource must be listed in spine.

Reading order. Required (with similar degenerate case comments as above).

In Readium, the only required listed is the spine (which is listed in reading order). The other list (resources) is optional.

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

In Readium we require a title, but @lrosenthol is right that if we extend our scope to any sort of publication this might be problematic.

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

I strongly lean towards optional. I think we should offer (both as options):

the ability to indicate that an HTML resource is meant for navigation, and render this publication as-is without requiring all the weird authoring rules associated to NavDocs in EPUB.
a separate machine-readable option in the manifest

The machine-readable info in the manifest should contain all navigation not rendered directly to the user (either because it shows up in the UI of the UA instead, or because it's useful for internal stuff).

From @lrosenthol

Maybe. I've been thinking more about how resources are really connected to the content page and not to the publication. Do we (well, the UA) really need (to load) the full 1000 images used by publication up front? Not necessarily - it may only want/need what is required to load the first content document. We should be sure to design with optimization and performance in mind from the start...

Unlike the <manifest> in EPUB, we shouldn't expect a manifest to reference all resources available in a Web Publication. It should only reference those that are deemed as very important for rendering content.

This would leave a lot of wiggle room for the kind of edge case scenario that you're thinking about (gigantic Web Publications).

GarthConboy commented 7 years ago

Thanks @GarthConboy for your proposal, I'll also reply point by point.

I'll reply point by point too. Though can't help but comment that this sort of design work is really hampered by use of github issues -- something (e.g., Google Docs) where you can really comment in place and have conversations would be better! :-)

Identifier of WP. Required. Should be the URL to the first (or only) document in the reading order. [Allowing a clueless UA to get somewhere useful]

I disagree about this, for several reasons:

The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.

What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.

A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

I think we only partially disagree. I think it is unwise to have the identifier of the publication be the URL of the manifest, as a clueless UA would render either nothing or something it doesn't understand (depending on format of said manifest). I do think the manifest should be pointed to either from the first markup document in the reading order, or potentially even from all of markup documents in the reading order (yes likely through a link as you say) -- this is the same madness you refer to -- but, doesn't seem that mad to me!

Identification as a WP. Required in the first (or only) document in the reading order. Likely not explicit, but should be implied by the presence of whatever of the below turns out to be required (e.g., minimal WP metadata, presence or, or a pointer to, the manifest).

What would you like to identify as a WP? If we're talking about a resource from the publication, it should be identified by either the presence of a link to a manifest, or because the UA has already accessed a manifest and knows that the resource is part of a publication.

I think Dave wanted to be able to identify the "site" as a WP -- and yes, I think the presence of a link to the manifest would be a fine way of doing that -- that's one of the options I was proposing.

For the manifest itself, it should be identified by a specific media type.

Agree.

List of publication resources. Required (yes, one could debate the degenerate case of a one-resource WP, but I'd lean toward "required" and this likely serves to identify the beast as a WP).

In Readium WebPub Manifest we also opted for a requirement, at least one resource must be listed in spine.

Agree.

Reading order. Required (with similar degenerate case comments as above).

In Readium, the only required listed is the spine (which is listed in reading order). The other list (resources) is optional.

Somewhat agree. I have list of resources as required (above), but that's almost a detail.

Metadata. Is some minimal set of metadata required? I lean toward "maybe required," but this clearly can be argued. I view it as required that there is an ability to specify WP metadata.

In Readium we require a title, but @lrosenthol is right that if we extend our scope to any sort of publication this might be problematic.

Close to agree.

Nav doc. I lean toward optional. But, the presence of a machine-readable Nav Doc has many plusses. I do not view it as requirement that said document be directly renderable, but the dual nature of the current EPUB Nav Doc is not all bad. A11Y issues also abound.

I strongly lean towards optional. I think we should offer (both as options):

the ability to indicate that an HTML resource is meant for navigation, and render this publication as-is without requiring all the weird authoring rules associated to NavDocs in EPUB.

a separate machine-readable option in the manifest

The machine-readable info in the manifest should contain all navigation not rendered directly to the user (either because it shows up in the UI of the UA instead, or because it's useful for internal stuff).

I think this will be the root of lots of conversation, but I don't think we're too far apart.

HadrienGardeur commented 7 years ago

I'll reply point by point too. Though can't help but comment that this sort of design work is really hampered by use of github issues -- something (e.g., Google Docs) where you can really comment in place and have conversations would be better! :-)

@GarthConboy yeah, it's not always easy to follow all threads in a discussion...

I think we only partially disagree. I think it is unwise to have the identifier of the publication be the URL of the manifest, as a clueless UA would render either nothing or something it doesn't understand (depending on format of said manifest).

This is where I strongly disagree. I think that a clueless UA won't ever be aware of the URL of a manifest, and that even in a WP aware UA, users will never be aware of the URL of a manifest either.

If they're not aware of its existence and therefore don't share it, we don't have any problem at all using the URL of the manifest as the WP identifier.

I do think the manifest should be pointed to either from the first markup document in the reading order, or potentially even from all of markup documents in the reading order (yes likely through a link as you say) -- this is the same madness you refer to -- but, doesn't seem that mad to me!

I think this should be entirely up to the author/publisher to decide where and when they include discovery. There shouldn't be any requirement IMO.

Also, I'd like to have the ability to remix content on the Web. If I have zero write-access to the content that I'd like to remix within a Web Publication, there's no way I'll be able to include such a link in HTML or HTTP.

About navigation

I think this will be the root of lots of conversation, but I don't think we're far apart.

Right, but I think a lot of the arguments in favour of including all machine-readable navigation in HTML are misguided:

as I've already explained in a previous comment, developers are pretty much limited to plain text in whatever UI they're building, HTML doesn't help at all when it's not directly rendered as-is
working with HTML is more difficult than working with JSON or XML when extracting machine-readable info
restrictions regarding how HTML must be authored are harmful, these restrictions mostly exist to make the NavDoc more machine-readable. Such restrictions would also limit the ability to re-use existing HTML resources and simply mark them as navigation.
including content that is not useful for the user (such as page-list) can be very problematic, that's even more of an issue since the hidden attribute is not always supported

GarthConboy commented 7 years ago

@HadrienGardeur I think we may be typing past each other on the first issue above. I'm presuming that the identifier of a WP will be a URL. And that the only two logical places for this URL to point is either at the publication's manifest or the first markup document in the reading order. Do you have a different view? Or do not view the identifier as a URL at all?

lrosenthol commented 7 years ago

@HadrienGardeur

Also, I'd like to have the ability to remix content on the Web. If I have zero write-access to the content that I'd like to remix within a Web Publication, there's no way I'll be able to include such a link in HTML or HTTP.

I moved the discussion of this item over to its own issue at #8

lrosenthol commented 7 years ago

@HadrienGardeur

Right, but I think a lot of the arguments in favour of including all machine-readable navigation in HTML are misguided:

I moved the discussion of this item over to its own issue at #9

HadrienGardeur commented 7 years ago

@GarthConboy

Fully agree that the identifier of a WP should be a URL, that's also a requirement in the Readium WebPub Manifest.
I don't think that the URL can be the same URL as the first markup document in the reading order for the reasons listed in a previous comment
While it could be a different URL than the first markup document, but still rely on content negotiation and/or an HTTP redirect to point to that document anyway, I don't think that's possible either because we won't be able to use them in certain situations.
I think it's perfectly acceptable to use the URL of the manifest as the identifier of the WP, because a clueless UA won't be aware of its existence, and users won't be aware of its existence either. The scenario where someone discovers a WP through a manifest and ends up with something that they can't use is very unlikely to ever happen.

GarthConboy commented 7 years ago

Yep -- I think we found our disagreement. I think if the identifier is a URL, folks will sent it around, and will expect it to "work". Thus, I disagree with your #2 and #4 above, and I still favor the identifier being the first content document. But, since I'm missing the call on Monday, you all can agree to something else, and I'll just whine for subsequent years.

HadrienGardeur commented 7 years ago

I fail to understand how you can completely disagree with my second point, let me quote precisely my previous comment:

The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.

What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.

A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

These are real issues, how do you address them if you decide that the URL of the first content document is the identifier of the WP?

The only situation that would make this acceptable is if we embed the manifest in HTML (which I find problematic for completely different reasons).

GarthConboy commented 7 years ago

@HadrienGardeur yep, guess I disagree with two of the three! :-)

The URL of the first document in the spine identifies that resource, it doesn't identifies the publication itself. Mixing those two up is very confusing.

I don't really see why. Assuming the first document in the spine points to manifest or contains the manifest, this seems an elegant solution -- a clueless UA can do something useful, and a clueful UA can chase down (or process) the manifest and do something better.

What happens if a resource is included in several publications? Do these publications all share the same identifier? This is madness.

I'd think "don't do that" is a fine answer (bug for bug compatible with EPUB today). And like you said, this could lead to want to include rather than link to the manifest.

A clueless UA won't expose the URL of the manifest anyway, it'll expose the URL of one of its constituent resources. It's perfectly fine to share any URL for a constituent resource and then discover the manifest through it ( in HTML, Link: in HTTP).

See #1. If the clueless UA gets the URL to the "lead" resource, it can just render the content, either ignoring an embedded manifest or not bothering to follow the link to an external one.

HadrienGardeur commented 7 years ago

@GarthConboy

Since this argument is spread across issues, I've also had to reply to that same question in a separate issue.

I don't really see why. Assuming the first document in the spine points to manifest or contains the manifest, this seems an elegant solution -- a clueless UA can do something useful, and a clueful UA can chase down (or process) the manifest and do something better.

It only feels elegant if the manifest and the first resource are one and the same (manifest embedded in HTML). Otherwise it feels very confusing to me to use the same identifier (URL) for two different concepts (publication vs resource).

GarthConboy commented 7 years ago

@HadrienGardeur -- Yep saw that too. Doesn't make me a convert. But, we'll see where the group ends up.

This is probably an issue we should try to resolve very soon, as it drives a number of subsequent decisions.

HadrienGardeur commented 7 years ago

Side question: what if I can't include a link to the manifest in the first resource in reading order?

What happens then?

lrosenthol commented 7 years ago

On Tue, Jul 11, 2017 at 10:33 PM, Hadrien Gardeur notifications@github.com wrote:

Side question: what if I can't include a link to the manifest in the first resource in reading order?

Under what circumstances would you author a publication where you couldn't link the manifest?

HadrienGardeur commented 7 years ago

What if the publication already exists on the Web and someone else would like to author a manifest for it?

There are plenty of publications on the Web that could benefit from such a manifest, I'll re-use the same example: http://poignant.guide/

w3c / wpub

Information content of the abstract manifest #6