Closed cweiske closed 7 years ago
rel=self was defined by Atom RFC https://tools.ietf.org/html/rfc4287 in 2005. But rel=canonical is more widely used for HTML.
it is also mentioned in the Web Linking spec https://tools.ietf.org/html/rfc5988#section-6.2.2
Are the two really semantically identical?
self
seems to be defined as:
Conveys an identifier for the link's context.
More elaborately from the Atom spec:
The value "self" signifies that the IRI in the value of the href attribute identifies a resource equivalent to the containing element.
While canonical
seems to be defined as:
Designates the preferred version of a resource (the IRI and its contents).
The spec for canonical can be found here: https://tools.ietf.org/html/rfc6596
An alternative to dropping rel=self
could be to also accept rel=canonical
perhaps? Have rel=self
for atom and rel=canonical
for html?
Yes, it makes no sense to use rel=canonical on Atom feeds.
So the rules could be:
Meh. Not sure this actually brings anything while it breaks a lot of existing implementations.
I would not like to add a new tag to HTML if the page already has rel=canonical - many pages use that already. But I see we should not look for it in the http link headers.
New suggestion:
I think there is a confusion here that self does not mean canonical. This is indeed confusing but I'll add a note to the spec to clarify this.
not having a "self pointing" link exposes us to silent failure (the subscriber subscribes to a url that is never actually pinged to the hub...). This is frequent with "silent" query strings.
Another example of why this is important is for URLs with redirects such as the 'today' URL for the IRC log in this very group. (@aaronpk can say it better!)
An example of when rel=canonical wouldn't work is the IRC logs for #indieweb and #social. The URL we tell people to bookmark is https://chat.indieweb.org/today
, however that URL will always redirect to the current day's permalink, such as https://chat.indieweb.org/2016-12-06
. That day's page would have a rel=canonical of itself, https://chat.indieweb.org/2016-12-06
, but a subscriber would need to use a topic URL of https://chat.indieweb.org/today
in order to receive updates.
The rel=self provides a way to advertise the topic URL to use, which may be different from the canonical URL. It probably would have been better to call it rel=topic, but I believe the term came from Atom's use of rel=self.
:+1: to rel=topic. If there are aggregate resources that share a hub, then the rel=self would not follow the semantics established. For example, imagine subscribing to wikipedia versus to each individual page in wikipedia, then rel=self from https://en.wikipedia.org/wiki/PubSubHubbub to https://en.wikipedia.org/websub/all would not be correct (I believe).
Just to be clear, I wasn't actually suggesting changing it to rel=topic. This is not a new spec and we would much rather not break every existing implementation just for aesthetic reasons.
The example you provided sounds completely fabricated to me. I don't think anyone on the https://en.wikipedia.org/wiki/PubSubHubbub page would expect to be able to subscribe to updates for all wikipedia articles by just clicking a button on that page. Instead, they would actually navigate to the home page (or some other feed page) and subscribe there, which would have the appropriate rel-self link.
That example is fabricated of course, but the situation is one that we're facing today in several environments. The usage is machine to machine, rather than a human clicking a button.
As an example, the Getty Museum has a collection of some 100k objects. Each description changes VERY rarely, but the changes are typically also VERY important to propagate as they reflect significant changes in state. It would be ridiculous to require systems to subscribe to each object individually. So from the description of the object, we would want to have systems subscribe to the general hub for all objects' changes. If the required pattern is to have an intermediary resource to which each description refers (the "navigate to the home page and subscribe" approach), then what is the link rel for that interaction so that machines can perform it?
The same occurs in the IIIF community. Changes to a particular image are also very rare, but there are millions of them at each organization. Or in scholarly communication -- aggregating preprint journal articles at a subject level.
Existing proposed uses in those two environments:
If you're saying that we shouldn't use websub for those use cases, that would indeed be good to know!
Thanks for clarifying, that use case makes sense now. It does seem to be something different from the scope of WebSub which is "subscribe to updates of this resource". It sounds kind of like the "RecentChanges" feed in MediaWiki, which is linked from every page. I think a standard way of finding that master feed would be a useful thing, and then WebSub would be used to subscribe to changes of that feed.
On 2017-01-12 10:36, Aaron Parecki wrote:
Thanks for clarifying, that use case makes sense now. It does seem to be something different from the scope of WebSub which is "subscribe to updates of this resource". It sounds kind of like the "RecentChanges" feed in MediaWiki, which is linked from every page. I think a standard way of finding that master feed would be a useful thing, and then WebSub would be used to subscribe to changes of that feed.
that sounds like a sequence of following a "collection" link (find the collection this resource belongs to) and then subscribing to that one might do the trick?
I don't think we made any new progress on this issue so I suggest closing it.
I agree. I believe the example in use I mentioned in https://github.com/w3c/websub/issues/68#issuecomment-265237327 illustrates the need for the separate rel value.
WebSub requires each resource to be delivered with a
rel=self
link:The web is already adding links to the resource itself in HTML pages, it's the
rel=canonical
link which is supported by major search engines since 2009.I do not see a reason to add a second link that has the same meaning. Please drop rel=self and replace it with rel=canonical.
See https://chat.indieweb.org/2015-05-22#t1432330810734000 for a discussion in die #indieweb channel about this: