w3c / webmention

Webmention spec
https://www.w3.org/TR/webmention/
112 stars 46 forks source link

Mechanism for multi-protocol/domain/etc. URL consolidation #103

Open fluffy-critter opened 4 years ago

fluffy-critter commented 4 years ago

For various reasons, some sites will have multiple schemes and/or domains that map to a single piece of content. When sending out a webmention, it's up to the sender of the mention to only use what's considered the canonical/best URL as the origin, and that's perfectly reasonable.

However, sometimes it's possible for a sender to erroneously send the wrong URL out (causing duplicate pings from multiple URLs), and it would be useful to be able to redact the non-canonical ones without deleting/redirecting from those URLs.

For example, if someone has a site available from multiple domains (e.g. www.example.com and example.com) and it's available from both http and https, and they don't redirect all of those to a single canonical URL, that means there are at least four distinct URLs that can act as sources of a ping.

One possibility (suggested by @sknebel on indieweb chat) is to have the endpoint be aware of the canonical URL (via u-url or rel="canonical" or similar) and then deduplicate based on that.

My proposal for the webmention spec would be to provide SHOULD-type recommendations for how incoming pings could map to a canonical URL (via discovered attributes in the HTML) for the purpose of deduplication, allowing for content to have a single canonical URL while still retaining the ability to be served up from multiple distinct URLs, without requiring the non-canonical source URLs themselves to redirect.

Ryuno-Ki commented 4 years ago

Hm, I'd expect a Location header in the response pointing to the canonical URL (e.g. http -> https or naked URL to www-URL). Perhaps it means more roundtrips between the sender (= me) and the receiver (the URL I want to send a Webmention to).

fluffy-critter commented 4 years ago

Wouldn’t a Location header cause most user agents to redirect to that canonical URL though? That’s the specific thing I’m trying to avoid.