TzviyaSiegman commented 5 years ago

こんにちはTAG！

I'm requesting a TAG review of:

Name: Web Publications
Specification URL: https://w3c.github.io/wpub/
Explainer, Requirements Doc, or Example code: WPUB Explainer, Use Cases and Requiremnets
Tests: [to come]
Primary contacts: @TzviyaSiegman, @mattgarrish

Further details (optional):

Relevant time constraints or deadlines: [not urgent, would be helpful to have feedback well in advance of F2F meeting 6-7 May]
[ ] I have read and filled out the Self-Review Questionnare on Security and Privacy. Work in Progress
[x] I have reviewed the TAG's API Design Principles

You should also know that this group works in conjunction with Publishing Business Group. The PWG recently changed direction a bit as outlined in this blog post. I am opening a separate ticket for a review of the inital draft of audio books.

We'd prefer the TAG provide feedback as (please select one):

[ ] open issues in our Github repo for each point of feedback
[x] open a single issue in our Github repo for the entire review
[ ] leave review feedback as a comment in this issue and @-notify [github usernames]

/cc @iherman @GarthConboy @wareid

dbaron commented 5 years ago

At this point I've read the explainer and much of the requirements document (skimming through some parts). I haven't really looked at the specification yet.

I think some key issues worth focusing on (as I look at the specification, and in further discussion) may be:

the relationship to manifest (somewhat answered in explainer)
the relationship to packaging
the use of JSON-LD: are implementations expected to interpret it as JSON, or to interpret the underlying RDF data model? (I prefer the former.)
requirement 5 suggests that Web Publications might be identified by a unique handle that can be mapped to a URL. How does this interact with the web's security model, which is based on the origin of the URL? (Also see requirement 31 and requirement 32.)
I'm interested to see how the specification addresses issues of addressing/URLs relating to the components of the package. (The requirements hit that issue multiple times but don't touch on the solution.)

Some other issues that may be worth investigating / discussing:

requirement 17 (Protection) seems like it may be problematic. (Similar for, say UC 114.)
what happens when resources beyond those listed in the manifest are used? (How do developers ensure they list all the resources in the manifest? Should implementations issue some sort of warning when resources beyond the manifest are used?)

Some other things I noticed that I don't think require TAG discussion:

in requirement 19 the example contradicts the requirement

dbaron commented 5 years ago

At this point I've read the main part of the specification, but not the appendices. (I'm not sure if I'll go through the appendices in detail.)

A bunch of detailed comments follow.

However, first, more generally, I'd note that many of the things that seemed likely to be scary as a result of reading the requirements document seem a lot less scary after reading the specification. It seems like the requirements document has gotten a bit out of sync with the specification, and it may be worth updating it to reflect how those requirements are addressed in the specification, or whether they are. (For example, the specification doesn't seem to have any notion of packages in it, whereas I was expected a packaging mechanism from the requirements document.)

PART I: Publication Manifest

It's a little disturbing to see section 2.3 refer to WebIDL as a representation of a data structure, rather than as an API for accessing the representation of a data structure.

The repeated use of WebIDL "partial" to amend the same interface within a single specification feels like a bad practice; it makes it hard to understand the interface. (I'd perhaps go so far as to say "partial" is always a bad practice.)

The spec still seems short on information for how implementations / processors must interpret the manifest (in particular, a clear processing model -- such a model would explain how to handle, say, an @context that points to a different URL than it's supposed to). It seems like a decent amount may be expected based on, say the section on additional properties in the manifest (although maybe it's not really expected since such processing is optional).

dictionary LinkedResources should probably use the WebIDL "required" attribute on its required member.

The description of default language and base direction seems a bit confusing. It seems to imply it's a default for the manifest and the publication, but then there's a note that says it only applies to the manifest and not to the resources that comprise the publication.

(I also wonder whether LocalizableString should allow specifying base direction in line with https://w3c.github.io/string-meta/ . There's a note saying this isn't possible in JSON-LD, which makes me wonder whether the LocalizableString defined in this specification should really be a reference to something defined elsewhere.)

The bit in the accessibility report section that uses an unregistered link rel value that looks like a URL seems odd. I don't know of any specification that defines URLs to be usable as link rel values; they're just keywords, so this seems worse than using a syntactically valid but not-yet-registered value, since it both doesn't fit the syntax and appears to imply to readers that they can use URLs in place of link rel values rather than registering them. (Although this isn't quite a link rel value, since the definition of LinkedResource says the rel value can be one of two things.) It seems best to just register the value. The same applies to cover nd page list.

The description of cover says "If multiple covers are specified, each instance MUST define at least one unique property to allow user agents to determine its usability (e.g., a different format, height, width or relationship)." but doesn't say how to define a height or width.

The section on linked records endorses organizations other than IANA extending the set of rel values, which again seems a bit strange, although again what's being used here isn't strictly link rel values, but a superset of them defined earlier.

PART II: Web Publications

Publication Bounds should perhaps refer to the concept of URL Equivalence in the URL spec.

I noticed that Primary Entry Page says that the primary entry page MUST be within the bounds of the publication (inferring that from the definition of the bounds), whereas Address imposes the same requirement, using different words, as a SHOULD. It seems like these should match, or maybe one should just refer to the other.

The words "identify the resource" in 3.3.4 Table of Contents seem like thy should link to 3.4.1.7 Table of Contents but they actually link to 3.3.4. Same for the words "table of contents property definition".

I don't immediately have any comments on 3.4.1.3 Canonical Identifier, but I think it's interesting and there may be other TAG members who have comments on it.

The second paragraph of 3.4.1.7 Table of Contents seems to incorrectly refer to "page list" rather than "table of contents"; this looks like a copy-and-paste error.

Reading 3.5.1 Obtaining a Manifest made me wonder about how this is intended to integrate with CORS. I'd note that it doesn't quite seem to integrate with fetch properly, since "request" ought to link to request, and that, in turn:

seems to require a client (there's no "unless otherwise stated")... although that seems odd to me; it's not quite clear to me how fetch integration is supposed to work.
recommends against using the "no-cors" mode

It also ought to refer to the "credentials mode" rather than the "credentials".

It's not clear to me whether the information from the manifest is going to be exposed to web content in ways that require CORS checks -- though it seems like it's probably a good idea to assume that it will at some point in the future, and thus that it's probably a good idea to use the "cors" mode. (I think that doesn't require special headers if the manifest is same-origin... although I'm not sure.)

BigBlueHat commented 5 years ago

Still processing all of this feedback, but wanted to follow-up on rel stuff. Specifically...

I don't know of any specification that defines URLs to be usable as link rel values

HTML5.2 includes:

The remaining values must be accepted as valid if they are absolute URLs containing US-ASCII characters only and rejected otherwise.

RFC 8288 refers to these as "Extension Relation Types":

Applications that don't wish to register a relation type can use an extension relation type, which is a URI [RFC3986] that uniquely identifies the relation type.

Consequently, to keep parity between <link> and Link, it has been a common practice in API design and endpoint discovery to use URLs as rel values--see Discovery of Annotation Containers for an example.

Cheers! 🎩

hober commented 5 years ago

I don't know of any specification that defines URLs to be usable as link rel values

HTML5.2 includes:

The remaining values must be accepted as valid if they are absolute URLs containing US-ASCII characters only and rejected otherwise.

The canonical definition of rel="" is in the HTML Living Standard, which contains no such text.

BigBlueHat commented 5 years ago

The canonical definition of rel="" is in the HTML Living Standard, which contains no such text.

The use of URLs in rel is a long standing extension mechanism used by REST APIs, RDFa (and by extension Microdata's conversion mappings), and by (at least) the Web Annotation Protocol's discovery method.

So, how does a W3C member help champion the return of that sentence?

For this case specifically, the group has chosen to register the rel values with IANA (as well as spec the related URL mappings). However, the Link header and <link> divergence also requires a "registration" with the wiki-based registry, so that's a potential action this group will also need to take. /cc @iherman @mgarish

domenic commented 5 years ago

Link and <link> are not terribly related, despite the unfortunate naming conflict and some historical attempts at convergence. Note the sentence

Registration of relation types in HTTP Link headers is distinct from HTML link types, and thus their semantics can be different from same-named HTML types.

in the spec.

mattgarrish commented 5 years ago

I believe the rel issue has been resolved in the specification as of a change made last week. As we are planning to register the three relations, there was no need to express them as URLs and they are now token values.

The rel property for LinkedObject is also not an equivalent of the HTML link element, so I don't believe there's any issue with our also recommending URLs for relations that are not going to be registered. I'm not sure what the answer is to whether the three new relations should also be registered in the microformats wiki. That all depends on whether the group thinks there will be a need to identify these resources in link elements within the HTML. They could be useful for general use, but it's a bit out of scope of the specification proper.

BigBlueHat commented 5 years ago

@domenic there's no "unfortunate naming conflict" as both were created by the same author (i.e. Sir Tim). RFC 2068 was the first appearance of Link and states:

The Link field is semantically equivalent to the element in HTML.

The return of the Link header in @mnot's RFC 5988 followed that history.

The more recent RFC 8288) notes that things have since diverged from a pure "semantic equivalence" with HTML's <link> (and Atom links) due to widening uses of both for more nuanced usage. However, there's still effort to understand (and often use) them together.

For example, the HTML Living Standard places Processing Link Headers under the link element section (and immediately after the Fetching and processing a resource from a link element). It's structured that way in part because that relationship is still there as can bee seen also in the Preload spec.

The one-to-one mapping or "semantic equivalence" wasn't at issue here, but to clarify that URL's have been (and are being) used by many communities as rel values in HTML and often serve as a more stable (or desirable) identifier for some communities over using either the IANA or the wiki registry.

The ask (which I can make more formally in the correct place) is that something along the lines of this text be reconsidered for re-inclusion into HTML (as I've not found the reasoning for it's removal yet):

The remaining values must be accepted as valid if they are absolute URLs containing US-ASCII characters only and rejected otherwise.

Thanks!

BigBlueHat commented 5 years ago

As we are planning to register the three relations, there was no need to express them as URLs and they are now token values.

@mattgarrish looks like the PWG will need to register them at both IANA (for Link header usage) and by adding them to the wiki registry per the registration types in HTML.

TzviyaSiegman commented 5 years ago

@BigBlueHat this is not the right place to discuss Link vs <link>. Let's stay a little closer to topic. Thanks :)

mattgarrish commented 5 years ago

looks like the PWG will need to register them at both IANA (for Link header usage) and by adding them to the wiki registry per the registration types in HTML.

This is definitely true for the publication relation type, yes, but I'm not following why it would be the case for pagelist, accessibility-report or cover (the previously-URL values). They are only identified in the resource list; we don't compile HTML link elements.

Again, not suggesting this wouldn't be useful to do, but it's not essential to web publications that we register in both places.

BigBlueHat commented 5 years ago

@mattgarrish was mostly thinking that maintaining parity would be useful to avoid collision caused by having separate registries--i.e. so there isn't a later <link rel="cover"> (or pagelist, etc) which differs in definition from the one's we plan to register at IANA.

iherman commented 5 years ago

Hi @dbaron

dictionary LinkedResources should probably use the WebIDL "required" attribute on its required member.

This is already the case in the editor's draft

iherman commented 5 years ago

@dbaron

(I also wonder whether LocalizableString should allow specifying base direction in line with https://w3c.github.io/string-meta/ . There's a note saying this isn't possible in JSON-LD, which makes me wonder whether the LocalizableString defined in this specification should really be a reference to something defined elsewhere.)

That was a subject of several long discussions both in this Working group (https://github.com/w3c/wpub/issues/354) as well in the aforementioned document's issue list (e.g., https://github.com/w3c/string-meta/issues/12) and the final comment (https://github.com/w3c/wpub/issues/354#issuecomment-484055862) put the issue at rest: it is not possible to express the base direction of a textual item in JSON-LD. I am not sure what other reference you refer to: referring to a data structure that cannot be expressed in the manifest itself is not a solution.

iherman commented 5 years ago

@dbaron,

I would like to react on your comments related to WebIDL.

It's a little disturbing to see section 2.3 refer to WebIDL as a representation of a data structure, rather than as an API for accessing the representation of a data structure.

To step back a bit: the reason of using WebIDL (or, shall we say, something like WebIDL) is actually related to another question of yours:

the use of JSON-LD: are implementations expected to interpret it as JSON, or to interpret the underlying RDF data model? (I prefer the former.)

Our view was that applications should be indeed able to access the data represented in the manifest based on the JSON structure only. To make this precise, we have to specify exactly what data structures to extract from JSON, taking into account the fact that the manifest definition itself allows for some ambiguity. (For example, while the value of an "editor" is supposed to be an object with at least a property called a "name", we allow the author of the manifest to put directly that name as a value of "editor". This is not unlike the similar feature schema.org has.)

We were looking for a formalism to express that data structure, and we did not find any better than WebIDL. We were influenced by, e.g., the Web App Manifest's usage of WebIDL which is mostly a series dictionaries expressing the underlying data (there are only a few API definitions which are related to event handling and such and not to the data expressed in the JSON manifest). We were considering other options (JavaScript, TypeScript, JSON schemas), but none of these seemed satisfactory in terms of readability, stability, or independence of implementation choices. We also considered the separate discussion in https://github.com/whatwg/infra/issues/159.

(B.t.w., we do provide a separate, a non-normative JSON schema for the manifest. But, as also mentioned in https://github.com/whatwg/infra/issues/159, those are not really easy to read. Besides, the JSON Schema is still evolving, could not be used as a stable reference.)

All that being said, I understand and share your unease about the usage of WebIDL; I think we would be happy to consider an alternative. We just did not find any...

The repeated use of WebIDL "partial" to amend the same interface within a single specification feels like a bad practice; it makes it hard to understand the interface.

To be honest, I am not sure why that would be bad practice, at least in this case. First of all, there is an appendix with the full WebIDL where all partials are "merged", i.e., the full WebIDL is available. On the other hand, it sounds like a good practice to put the formal data structure definition close to where the corresponding term is defined, described in English, and where, in general, examples are also provided. This is the structure we followed throughout the document. Can you elaborate why you think that is a bad practice?

dbaron commented 5 years ago

WebIDL

To step back a bit: the reason of using WebIDL (or, shall we say, something like WebIDL) is actually related to another question of yours:

Perhaps my concern was the description generally of WebIDL in that way, rather than that being a description of how WebIDL is being used in this specification.

`partial`

Can you elaborate why you think that is a bad practice?

I think partial is a bad practice because it breaks the ability of a reader of the specification to look at the definition of the interface and understand what methods it has on it -- whether than reader is a user or an implementor of the specification. It's essentially a syntax for monkeypatching another specification (or another section of a specification) with no indication in that other section that it is incomplete and patched elsewhere.

partial is a useful mechanism in early stage proposals that aren't yet agreed on: it says "when this proposal is ready, this should be merged in to this other interface". In other words, I think partial is reasonable to use when the specification that's using it is substantially less stable than the interface it's patching. But I think use in other contexts is a bad practice; it indicates that this spec is patching another spec rather than actually getting that spec revised appropriately so that readers of the patched spec know that the patch exists.

mattgarrish commented 5 years ago

Perhaps my concern was the description generally of WebIDL in that way, rather than that being a description of how WebIDL is being used in this specification.

Yes, this might be useful to clear up. We don't specifically say in the introduction that we're not exposing this data.

I think partial is a bad practice because it breaks the ability of a reader of the specification to look at the definition of the interface and understand what methods it has on it -- whether than reader is a user or an implementor of the specification. It's essentially a syntax for monkeypatching another specification (or another section of a specification) with no indication in that other section that it is incomplete and patched elsewhere.

We do say that the member definitions are defined in each relevant section where we declare the PublicationManifest dictionary.

As @iherman has already written, the idea was to provide an encapsulated view of the expected expression of each member property, rather than require the reader to refer to the IDL index any time they want to cross-reference the Web IDL expression.

I'm not qualified to counter you on how WebIDL use should be preferred, but personally I didn't think this was problematic when it was proposed as the WebIDL specification says:

Note: Partial interface definitions are intended for use as a specification editorial aide, allowing the definition of an interface to be separated over more than one section of the document, and sometimes multiple documents.

That's, in a nutshell, all we were trying to achieve. Is there another way to do what we wanted with WebIDL, or are you saying that we should only have the IDL index?

dbaron commented 5 years ago

Not only have the index -- but have the interface defined in one place, with the members linking to their definitions across the different sections (and those sections linking back to the interface). At least, I would consider that a better practice for readability.

dbaron commented 5 years ago

So one followup point here is that I'd like somebody with more expertise on CORS and how to integrate CORS and fetch into a specification to do a brief review of the issue I raised in the last 4-ish paragraphs of https://github.com/w3ctag/design-reviews/issues/344#issuecomment-485034489 .

mattgarrish commented 5 years ago

To follow up in the Web IDL comments, I've made some adjustments to the section to: 1) reword the intro prose to make clear that the IDL is only for representing the data model; 2) to consolidate the PublicationManifest dictionary; and 3) to add links back to the Web IDL from the property definitions.

If you have a minute to review, I'd appreciate if you have any other comments on it.

torgo commented 4 years ago

Picking this up at our f2f.

torgo commented 4 years ago

Possibly we should close this based on feedback from @mattgarrish on their Issue 22.

mattgarrish commented 4 years ago

Note that we've actually ended work on web publications at this point are are moving forward only with the manifest portion at https://github.com/w3c/pub-manifest

hadleybeeman commented 4 years ago

Thanks, @mattgarrish and @iherman. This has dramatically changed shape... would you like us to review the manifest?

We're happy to do that, if you like. It probably makes sense to open a new review for that.

As we're looking over the doc now, one question comes to mind: given that we are trying to resolve our issue on the proliferation of manifests in web specs, are there opportunities to reconcile this with other manifests?

We'll close this issue, but would be happy to pursue that discussion in our issue 423 or in a new issue for your manifest, if that's helpful to you.

wareid commented 4 years ago

Thanks @hadleybeeman for the check-in. The Publication Manifest document is just a simplification of the Web Publication one, we went from very specifically looking to specify publications on the web to focusing on the manifest format and then its application in Audiobooks. All of the comments from the TAG were factored in when we made our changes. We'll comment on 423 as well!

w3ctag / design-reviews

Web Publications review #344

PART I: Publication Manifest

PART II: Web Publications

WebIDL

`partial`