w3c / publ-wg

Home page of the Audiobooks Working Group
https://www.w3.org/publishing/groups/publ-wg/
Other
17 stars 20 forks source link

Information content of the abstract manifest #12

Closed dauwhe closed 7 years ago

dauwhe commented 7 years ago

What information is required for an abstract manifest? [edited to add items from comments]

  1. An identifier for the web publication, which should be a URL
  2. Some way of saying that this URL represents a web publication.
  3. Some way of identifying the constituent resources of the web publication.
  4. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  5. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)
  6. Locating table of contents or other navigation structure

What else? I think we should distinguish required information from "nice to have" information.

GarthConboy commented 7 years ago

I'd also throw in:

-- Reading order -- Basic metadata (yes, a can of worms we'll need to open)

Re the #1 and #2 just above in Dave's original issue, it seems they may want to be pre-manifest -- defined before the manifest is found, or be the actual path to the manifest (or to a "first file" that can be rendered, but also somehow points to the manifest).

iherman commented 7 years ago
  1. Some way of providing a preferred order of (some of) the constituent resources in case there is more than one
  2. Some way of being able to add more complex metadata to a publication. (Not clear to my mind whether we would define a minimally required set of metadata, but the slot should be there.)
iherman commented 7 years ago

(Wow. I just said the same thing as Garth just in other words. I swear we did not conspire...)

mattgarrish commented 7 years ago

What is meant by required here? Must always be present or must be accounted for in the design? This is why I wasn't sure at the f2f if navigation constituted a top-level or lower-level consideration.

A standardized means of locating the table of contents seems critical to me, even if it's optional to define and there are no epub-like rules on its construction.

GarthConboy commented 7 years ago

The updated #6 in the first panel says "Locating table of contents or other navigation structure", we should also consider:

-- Do we need such a Nav file (likely yes for A11Y) -- Should it be in the Manifest or pointed-to by the Manifest (I could see an argument for all eggs in one basket -- though the machine readable or renderable discussion will arise)

dauwhe commented 7 years ago

Do we need such a Nav file (likely yes for A11Y)

See #14

Should it be in the Manifest or pointed-to by the Manifest

Interesting question. I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

HadrienGardeur commented 7 years ago

I know Hadrien has proposed including section titles in a JSON manifest, but I have major concerns about possible reader-facing text in JSON (especially given that there's a standard html way to do this stuff).

IMO the Navigation Document in EPUB 3 is a failed experiment. Most EPUB 3 documents that I've seen end up including at least two HTML table of contents:

Most EPUB 3 reading systems do not render these Navigation Documents either, they simply parse them, extract the info and display things using their own UI.

This is a typical example of "spec purity" (the beauty of the Navigation Document) vs real world usage (no one is rendering these documents and we end up with more redundancy instead of less).

Readium (1, JS and 2) ended up parsing the info in the Navigation Document and providing a JSON output instead, which is much easier for developers to work with.

In the Readium Web Publication Manifest:

HadrienGardeur commented 7 years ago

To go back to the initial question, in Readium we separate clearly the abstract model with the minimal requirements for a manifest.

The abstract model has three core concepts:

For each core concept, we make sure that:

The basic requirements for a manifest are then based on that model:

llemeurfr commented 7 years ago

An identifier for the web publication, which should be a URL

Better, an IRI because a) may be a urn (up to the publisher to choose, the Web doesn't care) and b) i18n is important. A URL to the origin is also important but should be another property.

WSchindler commented 7 years ago

I would like to add:

  1. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)
HadrienGardeur commented 7 years ago

Language and direction (ltr vs rtl) should be two separate metadata. Agree that we need to allow more than one language.

lrosenthol commented 7 years ago

If we plan to use anything other than a URL (as defined by the HTML spec - https://www.w3.org/TR/WD-html40-970917/htmlweb.html), then we are going to need to be willing to jump into the current battle between the W3C and the IETF on the definition of URL/URI/IRI etc. Here is an old blog entry about it - http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI

On Mon, Jul 3, 2017 at 8:43 AM, L. Le Meur notifications@github.com wrote:

An identifier for the web publication, which should be a URL Better, an IRI because a) may be a urn (up to the publisher to choose, the Web doesn't care) and b) i18n is important. A URL to the origin is also important but should be another property.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/publ-wg/issues/12#issuecomment-312635676, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNUBV20dmP2MLDyjT0lS3eVlEeU8gks5sKOHjgaJpZM4OGuBw .

llemeurfr commented 7 years ago

Re. URL vs IRI, after reading https://www.w3.org/International/wiki/IRIStatus, I must admit that this seems like a can of dirty warms. Apart from trying to allow for an extended i18n of publication identifiers, there is still the question of URNs allowed or not as global identifiers. For instance, I spotted that most @HadrienGardeur's Manifest samples use isbn urns as identifiers.

HadrienGardeur commented 7 years ago

@llemeurfr you're mixing up two different concept regarding the Readium Web Publication Manifest.

Keep in mind that we started this work in the context of BFF and that for Readium-2 we mostly ingest EPUB files.

The only requirement in the draft document for the Readium WebPub Manifest is to always provide a self link. In the context of a Web Publication it makes perfect sense: if a publications lives on the Web, we need a URL that can point to its manifest.

Here's a basic example using the Readium WebPub Manifest model:

"@context": "http://readium.org/webpub/default.jsonld",
"metadata": {
  "title": "The Master and Margarita"
},
"links": [
  {"rel": "self", "href": "http://example.com/manifest.json", "type": "application/webpub+json"}
],
"spine": [
  {"href": "http://example.com/chapter1", "type": "text/html"}
]

If the publication has an additional identifier, this can be provided in its metadata:

"metadata": {
  "title": "The Master and Margarita",
  "identifier": "urn:isbn:9780141180144"
}

That second identifier is not a requirement in the Readium model, and we can't expect all Web Publications to have such an identifier either.

The reason why most of our current samples have URNs (mostly for ISBNs or UUIDs) is because we ingest EPUB files or provide samples for books where ISBNs are very common.

dauwhe commented 7 years ago

I would like to add:

  1. language(s) used in the WP - the plural is due to the fact that we will have publications such as parallel texts (original + one or more translations), bilingual dictionaries which contain 1-n languages . The language used has also implications for rendering (e.g. "ltr" vs "rtl", vertical layout)

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

HadrienGardeur commented 7 years ago

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

The manifest declares the language for the publication, while HTML is meant to declare the language for that resource. The UA would simply set the default to language B but override that option with language A as it displays or interacts with that HTML page.

llemeurfr commented 7 years ago

you're mixing up two different concept regarding the Readium Web Publication Manifest.

That's right. If a Web publication is copied to another website, this value will not be modified. Therefore a possible definition of the self link is "The original location of the Web Publication", which can be aligned with Requirement 8 for Web Publications: "There should be a way to uniquely identify a Web Publication."

HadrienGardeur commented 7 years ago

From RFC5988:

o Relation Name: self o Description: Conveys an identifier for the link's context. o Reference: [RFC4287]

WSchindler commented 7 years ago

It's of course true that via @lang or @xml:lang, you may define the language(s) used in your HTML. I still think that the point of entry for a UA consuming a WP would be the manifest where it would be helpful to find an information on the languages used in the WP. If you have a Chinese-English dictionary, it is IMO no trivial task to prepare the rendering.

lrosenthol commented 7 years ago

Actually, I would expect the UA to completely ignore the language settings (A, in this case) in the manifest - and only concern itself with the actual resource being processed/rendered (B, in this case). The language (or languages) in the manifest have no bearing on the actual content - they are (IMO) informational only.

On Wed, Jul 5, 2017 at 9:11 AM, Hadrien Gardeur notifications@github.com wrote:

My only concern is that HTML already has mechanisms for describing the language(s) of content. What happens when a user agent opens an HTML page declared with language A, finds a rel=manifest link, follows it, and sees language B declared?

The manifest declares the language for the publication, while HTML is meant to declare the language for that resource. The UA would simply set the default to language B but override that option with language A as it displays or interacts with that HTML page.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/publ-wg/issues/12#issuecomment-313098532, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNbw7uxWapNOfZZN7r09Gmn2AxeqPks5sK4uKgaJpZM4OGuBw .

lrosenthol commented 7 years ago

If a Web publication is copied to another website, this value will not be modified

That's not necessary true. The new site may well change the link(s) in the manifest. There is nothing about it that is "off limits" - certainly not in a WP, and possibly not even in a PWP.

On Wed, Jul 5, 2017 at 10:04 AM, L. Le Meur notifications@github.com wrote:

you're mixing up two different concept regarding the Readium Web Publication Manifest.

That's right. If a Web publication is copied to another website, this value will not be modified. Therefore a possible definition of the self link is "The original location of the Web Publication", which can be aligned with Requirement 8 for Web Publications: "There should be a way to uniquely identify a Web Publication."

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/w3c/publ-wg/issues/12#issuecomment-313112802, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNRbejRAPPpj2OsrzKSZptKCwspLPks5sK5gCgaJpZM4OGuBw .

HadrienGardeur commented 7 years ago

Actually, I would expect the UA to completely ignore the language settings (A, in this case) in the manifest - and only concern itself with the actual resource being processed/rendered (B, in this case). The language (or languages) in the manifest have no bearing on the actual content - they are (IMO) informational only.

While rendering content, sure I fully agree. But a UA can provide additional services on top of it, for example dictionaries or search. The publication metadata can be useful in that regard.

mattgarrish commented 7 years ago

I would expect the UA to completely ignore the language settings (A, in this case) in the manifest

I agree it's informative and must not be used for rendering content (or metadata), but the same question about value has been raised in epub revisions and the case has been made that it does have uses (e.g., pre-loading tts languages, offering access to dictionaries, etc.).

lrosenthol commented 7 years ago

On Wed, Jul 5, 2017 at 12:21 PM, Hadrien Gardeur notifications@github.com wrote:

But a UA can provide additional services on top of it, for example dictionaries or search. The publication metadata can be useful in that regard.

It could indeed be useful - and whether a UA chooses to use it for that or not is (IMO) out of scope for our work.

HadrienGardeur commented 7 years ago

It could indeed be useful - and whether a UA chooses to use it for that or not is (IMO) out of scope for our work.

Defining the UA behavior is out of scope, but making sure that it has relevant info needed is definitely within scope.

dauwhe commented 7 years ago

This issue was moved to w3c/wpub#6