w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

Different relations for linking to manifest and primary entry page? #448

Open mattgarrish opened 5 years ago

mattgarrish commented 5 years ago

I'm sure this was discussed before, but I can't find the right combination of keywords to locate the discussion.

At any rate, our algorithm requires that processing of the manifest begin with the primary entry page, so we have a significant distinction between:

I clarified in one of the last PRs that only the primary entry page can link to the manifest, and added a placeholder that other resources have to link to the primary entry page. But that still leaves the question, what is the expected relation for linking to the PEP:

iherman commented 5 years ago

Good catch!

my current take on it, but I am open to alternatives, is

llemeurfr commented 5 years ago

I perceive that there is a move from what was agreed before = the possibility to link WP resources to the manifest, to a new type of linkage = the possibility to link WP resources to the PEP.

I agree with this move, as the agreement is that the PEP is the entry point of the WP, so it makes sense to use it as a "boot record".

But I don't remember it was explicitly agreed by the WG ; and modifications have to be applied to some wording in the spec, e.g.

3.3.3 "Although any resource can link to the manifest, ..." 3.4.2.2 "With the exception of the primary entry page, linking a resource to its manifest is OPTIONAL. ..."

mattgarrish commented 5 years ago
  • 'home', that appears in the wiki page, referred to from the HTML spec, seems to be the most appropriate one...

I'm just worried about using a generic term because: 1) it won't ever be able to convey that the resource belongs to a publication (e.g., it will be ambiguous when a page is referring to its actual site home page or a publication); and 2) I'm not sure whether the idea that a resource can belong to multiple publications meshes with the idea that a page has multiple home pages or start pages (i.e., will multiple links of a generic type be ignored).

iherman commented 5 years ago

I think we should have a clear idea why we need those back links in the first place? What is the use case from a WP point of view to have such link elements?

mattgarrish commented 5 years ago

I think we should have a clear idea why we need those back links in the first place

Given that user agents don't generally expose links, I wouldn't put much faith in that happening. But, where it seems like it might be useful would be for SEO (e.g., so that a search result could list what publications a resource belongs to).

Maybe that could just be done with isPartOf, which also seems to be in schema.org/CreativeWork? The author could wire the semantic up onto a hyperlink for the user.

iherman commented 5 years ago

@mattgarrish just to understand...

Maybe that could just be done with isPartOf, which also seems to be in schema.org/CreativeWork? The author could wire the semantic up onto a hyperlink for the user.

Meaning that the author of a, say, chapter, could put a standard schema.org data, in JSON-LD, RDFa, or microdata, using isPartOf? That sounds like a perfectly fine approach to me. Browsers (as far as I know) do not do anything with a <link rel='home'...> anyway...

mattgarrish commented 5 years ago

Meaning that the author of a, say, chapter, could put a standard schema.org data, in JSON-LD, RDFa, or microdata, using isPartOf?

Exactly. It could just be placed on an explicit link back to the PEP, like so (hoping I have this right):

<div vocab="https://schema.org" typeof="CreativeWork">
    <a href="index.html" typeof="CreativeWork" property="isPartOf">Moby Dick</a>
</div>
iherman commented 5 years ago

Which just shows that schema.org does have a bunch of things to offer...

Should be part of some best practices doc, I guess. Or do think it should be part of the main spec?

mattgarrish commented 5 years ago

Should be part of some best practices doc, I guess. Or do think it should be part of the main spec?

Ya, I think somewhere in between -- maybe a note that resources that need to be linked back to the PEP should use available web mechanisms, or something vague along those lines, with the actual practice in a BP doc. Would be useful in the section where we limit linking to the manifest to the PEP.

If we're not expecting any behaviour from it, we shouldn't formally introduce anything in the specification. The world changes quickly...

BigBlueHat commented 5 years ago

If we're not expecting any behaviour from it, we shouldn't formally introduce anything in the specification. The world changes quickly...

The point of the rel="publication" pointing to the publication address was for discoverability of the publication itself (i.e. it's canonical address which loads an entry page when dereferenced which itself would contain--or point to...--a manifest).

Imagine the following:

GET /moby-dick/chapter1.html

<html>
  <link rel="publication" href="/moby-dick/">
</html>
GET /moby-dick/

<html>
  <script type="application/ld+json" id="wpub">
  {"...publication...": "...manifest..."}
</html>

Discovering the publication from chapter1.html provides a UA with the opportunity to "hoist" a reading experience and "re-navigate" to chapter1.html. The entry page (loaded from the publication's address) becomes the "brains" of the publication and contains its reading order and acts as the "runtime"/state-machine.

The manifest being external from the publication's "brain" is what introduces the weird indirection that seems to keep everyone confused. Because you then move from discovery (from chapter1) => found (publication address) => discovery (manifest) => found (publication address) => ...rinse and repeat.

To accommodate that weird indirection, the manifest could get its own rel as @iherman suggested--something like rel="publication-manifest".

So, the embedded manifest would change to:

GET /moby-dick/

<html>
  <link rel="publication-manifest" href="wpub.json">
</html>

rel's are relationships between resources, so read in prose the above proposal reads:

mattgarrish commented 5 years ago

Discovering the publication from chapter1.html provides a UA with the opportunity to "hoist" a reading experience and "re-navigate" to chapter1.html. The entry page (loaded from the publication's address) becomes the "brains" of the publication and contains its reading order and acts as the "runtime"/state-machine.

That's a theory for it, sure, but the way I read Ivan's question is whether there's any reality for it at this time. The specification doesn't detail anything more than how to harvest information from the manifest, and we have a semantic to locate it.

Where we agree is that if we end up with two semantics, we're going down entirely the wrong road. I can readily imagine the confusion we'll cause by only allowing one page to reference the manifest while every other has to reference the page that references the manifest.

But embedding doesn't remove the need for the entry page to have to identify itself as having a manifest. I don't see the day coming when user agents will parse the data of any script tag containing json-ld data anywhere on the web just to see if there might be a manifest inside. There still needs to be a trigger, even if it's a self-reference.