w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
79 stars 19 forks source link

usage of rel=canonical ? #324

Closed iherman closed 6 years ago

iherman commented 6 years ago

The current draft, when discussing the canonical id, has a reference to the "canonical link", namely

The canonical identifier can be used as the target of a "canonical" link

It is, however, not clear what the exact relationship of these two concepts are. This is all the more an issue when the manifest is embedded within the same file.

iherman commented 6 years ago

My proposal would be to do the same as for the document title. Ie,

One could also consider doing this even in the case when the manifest is not embedded (ie, it is in a separate file) although that can be considered as questionable if the publication contains several html resources (elements in the header of the primary entry page are "metadata", in the general sense, of the entry page itself as opposed to the full collection).

HadrienGardeur commented 6 years ago

I have mixed feelings about this.

IMO, the canonical link in the entry page is a better fit for url than @id.

iherman commented 6 years ago

IMO, the canonical link in the entry page is a better fit for url than @id.

I do not think so. A typical case is a W3C rec: one html file per rec (usually)

The value of @id is that it is the same for all WPUB versions (and always points at the newest) whereas url is date specific.

HadrienGardeur commented 6 years ago

Another draft worth following for this discussion: https://tools.ietf.org/html/draft-vandesompel-citeas-03#section-5.2

iherman commented 6 years ago

Another draft worth following for this discussion: https://tools.ietf.org/html/draft-vandesompel-citeas-03#section-5.2

Thanks for the pointer @HadrienGardeur, very interesting. Having read the draft, I think we have two separate issues, though.

  1. We may need, at some point, the possibility to add, to our manifest, a series of links that are all, in some way or other, "related" to the WP as a resource using various "rel" relationships. The current IANA values of "alternate", "duplicate", or, if becoming an accepted value, "cite-as" are all good examples. (Technically speaking, it would be a series of PublicationLink objects using those rel values, I presume.) If we have a separate discussion with, say, people in the archival community, that might very well come to the fore.
  2. There is the much simpler issue of how to relate the rel=canonical link relation that may be available in the header of the primary entry page and the canonical identifier of the infoset.

I think that the current issue should be related only to (2) above, and the proposal essentially says that the usage of those two is conceptually identical for a publication (and does not go into the question whether that specific usage is semantically correct or not), with a higher priority given to the @id value in the manifest (if it exists).

We may want to open a separate issue around (1), or wait to see if a discussion with the archival community helps us clarifying those issues even more. Although it may be that this would be more a matter for a next version of WPUB...

HadrienGardeur commented 6 years ago

Even within the scope of (2), I think that canonical might not always be the best source.

While we were discussing the infoset, we mentioned DOIs and ISBNs as good examples of canonical identifiers. Such identifiers would not show up in a link for rel="canonical" and are a much better fit for cite-as, as explained in https://tools.ietf.org/html/draft-vandesompel-citeas-03#section-5.2

HadrienGardeur commented 6 years ago

For point (1), we already have links in our manifest, which can be used to do various things:

We might recommend some of those rel values or keep this for a future best practice document.

Are you suggesting @iherman that ISBNs and DOIs should be declared in a link using cite-as instead of @id?

iherman commented 6 years ago

@HadrienGardeur

First of all, you are right about using links for (1). In a way, that is then off the table in terms of a standard (and very much relevant for some sort of a best practice)

No, I did not think of using link using cite-as so far. But we have several scenarios whose effects on the manifest we have to specify:

  1. There is an explicit @id in the manifest
  2. Otherwise
    1. There is a <link rel='canonical'...> and no <link rel='cite-as'...> in the header
    2. There is a <link rel='cite-as'...> and no <link rel='canonical'...> in the header
    3. There is both a <link rel='cite-as'...> and a <link rel='canonical'...> in the header

I think that (1) is clear: that value dominates. The question is whether any of the (2.i), (2.ii), or (2.iii) influences the value of @id.

We may decide that none of (2.i), (2.ii), or (2.iii) affects the value of @id. We may decide that they do, and maybe cite-as takes a priority over canonical, if applicable.

Having read the draft of cite-as, and seeing that it is not really clear-cut when to use what, I am now tempted not to automatically generate a value for @id and leave the usage of that value in the hand of the authors. In any case, I think the current note in the text (quoted in the issue description) should be removed as pretty much meaningless for me...

(In any case, I would expect all this to be valid for embedded manifests only.)

@tcole3, your background might come in handy in all this...

HadrienGardeur commented 6 years ago

I am now tempted not to automatically generate a value for @id and leave the usage of that value in the hand of the authors

+1 for that

iherman commented 6 years ago

Propose closing by removing the note in the draft referring to the canonical link.

@TzviyaSiegman ?

TzviyaSiegman commented 6 years ago

Propose closing by removing the note in the draft referring to the canonical link.

I agree.

@mattgarrish would you please remove the note at https://w3c.github.io/wpub/#canonical-identifier when you next do a PR